Eduction is a matching and extraction tool, based on regular expressions. It does not have any built-in functionality to extract attributes from a single match.
With sentiment analysis, you can match the phrase Their service is fantastic as conveying positive sentiment, but Eduction does not break the phrase down to identify service as the subject matter, and fantastic as the adjective that describes the subject.
There is a feature in the Eduction grammar that allows you to extract these attributes from the phrase. The attributes are called components because they are the components of a single match.
Additional configuration is required to use components. See Configure Eduction Components.
In the Eduction grammar, components are defined using the extension operator (?A=ComponentName:Pattern
). Consider the following example entity:
<entity name=test> <pattern>(?A=SUBJECT:(?A^noun)) is (?A=SENTIMENT:(?A^adjective))</pattern> </entity>
In this example, the noun
entity might be defined (in an earlier part of the grammar) to match nouns such as service and facility. The adjective
entity might be defined to match descriptions such as fantastic and appalling. This entity matches the phrase service is fantastic, and then returns the SUBJECT
component with the text service, and the SENTIMENT
component with the text fantastic.
The English Eduction sentiment analysis grammar has the TOPIC
, SENTIMENT
, POSITIVE
, and NEGATIVE
components defined. You can use these components by configuring Eduction accordingly.
Components are useful when the information that you want to match has some underlying pattern that you want to preserve.
You might use components to extract data in tables and return it in a suitable format.
Most of the standard grammars do not have components defined, because these grammars are mainly dictionaries or basic patterns that you can use to build more complex patterns. You might want to define components when you reference these basic entities in your patterns for custom grammars.
|