Sentiment Analysis
The sentiment analysis grammar files contain:
-
dictionaries of types of word (for example, positive adjective, negative noun, neutral adverb, and so on).
-
patterns that describe how to combine these dictionaries to form positive and negative phrases.
For example, you could run sentiment extraction using the English sentiment grammar file (sentiment_eng.ecr
), with the following hotel review as the input file:
The room was nice enough, with a plug in radiator, tv with an English news channel, hot shower, comfy bed. The receptionist we first dealt with was miserable and rude, and just grunted at us and rolled her eyes because we were too early for check in having just got off the morning train from Khabarovsk. Fortunately, a younger receptionist with a nice smile appeared, spoke to us helpfully suggesting a few cafes nearby to pass some time, and we tried to forget about the other woman.
Breakfast is terrible. Unidentifiable cordials, gloomy porridge, bread rolls filled with things you don't expect for breakfast, like potato, egg and dill. Don't come here for the breakfast, but for the cost of the room in a city like Vladivostok, the hotel is still decent value for money.
The following is a sample of the output that this produces:
<?xml version="1.0" encoding="UTF-8"?> <MATCHLIST> <DOCUMENT Type="IDOL IDX" ID="Unknown"> <FIELD Name="DRECONTENT"> <FIELD_INSTANCE Value="1"> <MATCH EntityName="sentiment/positive/eng" Offset="7" OffsetLength="5" Score="1.05" NormalizedTextSize="17" NormalizedTextLength="17" OriginalTextSize="17" OriginalTextLength="17"> <ORIGINAL_TEXT>The room was nice</ORIGINAL_TEXT> <NORMALIZED_TEXT>The room was nice</NORMALIZED_TEXT> <COMPONENTS> <COMPONENT Name="TOPIC" Text="The room" Offset="0" OffsetLength="0" TextSize="8" TextLength="8"/> <COMPONENT Name="SENTIMENT" Text="nice" Offset="13" OffsetLength="13" TextSize="4" TextLength="4"/> </COMPONENTS> </MATCH> <MATCH EntityName="sentiment/negative/eng" Offset="494" OffsetLength="492" Score="1.2" NormalizedTextSize="21" NormalizedTextLength="21" OriginalTextSize="21" OriginalTextLength="21"> <ORIGINAL_TEXT>Breakfast is terrible</ORIGINAL_TEXT> <NORMALIZED_TEXT>Breakfast is terrible</ NORMALIZED_TEXT> <COMPONENTS> <COMPONENT Name="TOPIC" Text="Breakfast" Offset="0" OffsetLength="0" TextSize="9" TextLength="9"/> <COMPONENT Name="SENTIMENT" Text="terrible" Offset="13" OffsetLength="13" TextSize="8" TextLength="8"/> </COMPONENTS> </MATCH> </FIELD_INSTANCE> </FIELD> </DOCUMENT> </MATCHLIST>
The following example configuration shows the recommended usage:
[Eduction] ResourceFiles=grammars/sentiment_eng.ecr // Note: replace sentiment_eng.ecr by sentiment_user_eng.ecr if using user modification // standard entities for all sentiment analysis in English: Entity0=sentiment/positive/eng Entity1=sentiment/negative/eng EntityField0=POSITIVE_VIBE EntityField1=NEGATIVE_VIBE EntityComponentField0=TOPIC,SENTIMENT EntityComponentField1=TOPIC,SENTIMENT // some invalid matches are given very low scores so that we can filter them out: MinScore=0.1 // for extraction of Twitter handles, hashtags and emoticons: TangibleCharacters=@#:; // for displaying metadata: OutputScores=True OutputSimpleMatchInfo=False EnableComponents=True
For more information about the sentiment analysis grammar files, see Sentiment Grammars.