Technical Reports


Click here for full text: PDF

Combining Lexicon-based and Learning-based Methods for Twitter Sentiment Analysis

Zhang, Lei; Ghosh, Riddhiman; Dekhil, Mohamed; Hsu, Meichun; Liu, Bing
HP Laboratories


Keyword(s): information analytics; sentiment analysis; twitter

Abstract: With the booming of microblogs on the Web, people have begun to express their opinions on a wide variety of topics on Twitter and other similar services. Sentiment analysis on entities (e.g., products, organizations, people, etc.) in tweets (posts on Twitter) thus becomes a rapid and effective way of gauging public opinion for business marketing or social studies. However, Twitter's unique characteristics give rise to new problems for current sentiment analysis methods, which originally focused on large opinionated corpora such as product reviews. In this paper, we propose a new entity-level sentiment analysis method for Twitter. The method first adopts a lexicon-based approach to perform entity-level sentiment analysis. This method can give high precision, but low recall. To improve recall, additional tweets that are likely to be opinionated are identified automatically by exploiting the information in the result of the lexicon-based method. A classifier is then trained to assign polarities to the entities in the newly identified tweets. Instead of being labeled manually, the training examples are given by the lexicon-based approach. Experimental results show that the proposed method dramatically improves the recall and the F-score, and outperforms the state-of-the-art baselines.

7 Pages

External Posting Date: June 21, 2011 [Fulltext]. Approved for External Publication
Internal Posting Date: June 21, 2011 [Fulltext]

Back to Index