Perform Sentiment Analysis on Short Comments
The standard sentiment analysis grammars are designed for high precision. For some sources of short comment data, such as YouTube comments, no positive or negative matches are found in some documents despite sentiment clearly being expressed.
If recall with the full sentiment_eng.ecr
grammar file is too low, and your documents are generally short comments, use sentiment_basic_eng.ecr
to extract additional matches. This grammar contains carefully-selected lists of positive and negative terms that help determine the sentiment of a document in which sentiment_eng.ecr
found no matches.
sentiment_basic_eng.ecr
contains terms in title case, but research shows that for most data these impair recall, so these are given a lower score. Micro Focus recommends that you set EntityMinScoreN
to 0.4
to filter out these terms unless you need them.
sentiment_basic_eng.ecr
does not expose TOPIC or SENTIMENT components, and does not use scores to reflect strength or reliability of polarity. The following additional example configuration shows the recommended usage:
[Eduction] ResourceFiles=grammars/sentiment_eng.ecr,grammars/sentiment_basic_eng.ecr // optional further layer of analysis for very short documents: Entity2=sentiment/basic_positive/eng Entity3=sentiment/basic_negative/eng EntityField2=BASIC_POSITIVE_VIBE EntityField3=BASIC_NEGATIVE_VIBE // remove this setting to include basic matches in titlecase - this is not recommended because on most data it decreases precision: EntityMinScore2=0.4 EntityMinScore3=0.4