Tuesday, September 9, 2014

Tweet Sentiment Analysis

Hey there,

It’s been a while since we last posted but we are pretty excited about our new sentiment analysis transforms so I thought I would make a quick post about it.

Sentiment analysis can be described as the use of natural language processing (NLP) to extract the attitude/opinion of a writer towards a specific topic. With the overwhelming amount of data being posted on the Internet every day and no way to read it all, sentiment analysis has become a really useful tool for extracting and aggregating opinions on a specific topic from many different sources. The potential use for sentiment analysis is endless, a few examples are things like brand reputation monitoring, market research, stock-exchange monitoring, etc. The transform that we built takes a Tweet as its input entity and returns either positive, neutral or negative entity. This way a large amount of Tweets can easily be categorized according to their sentiment.

Although sentiment extraction is a relatively new area of research there are quite a few methods of going about it and a lot of companies offering different sentiment analysis APIs. With many APIs to choose from it was quite difficult to decide which one would work best for the transform. I decided to use my top four APIs, aggregated their result and use that as the output of my transform. The problem with this method was that most of the time the APIs would return different and often obviously incorrect results (I won’t mention any names). While some APIs seemed to work well on certain topics of Tweets they would fail horribly on others. After much experimentation I settled for using only AlchemyAPI’s sentiment analysis tool which seems to work the best out of all the APIs that I tested, and I tested quite a few so well done to them.

I then built a new machine named Twitter Analyser to use with the new sentiment analysis transform. This machine takes a phrase in as its input and searches Twitter for Tweets with this phrase. From the Tweets that are returned hash tags, links, sentiment and uncommon words found in the Tweets are extracted. The uncommon words are extracted with one of our other new transforms that checks the word against an ordered list of common words, if the input word does not occur in the list before a certain threshold the word is returned as an entity.  The To Words transform can takes in two transform settings: the threshold it must search in the list of common words and words that should be ignored by the transform. The machine will run every 5 minutes to search Twitter for new Tweets. Running the machine in bubble view it is easy to see common hashtags, links, words and sentiment between Tweets of a certain topic. The screenshot of the graph below shows an example of using the Tweet Analyser machine on the phrase AlchemyAPI:

In this image the entities are sized according to their number of incoming links so you can see what is common between many Tweets.  From the image you can see common hashtags like: #ai, #deeplearning and #sentimentanalysis as well as pick out the common links and words between the Tweets.

As always enjoy responsibly!

PS: As most of you already know we have recently released an update to Maltego [version 3.5.2], our YouTube video here gives a quick breakdown of the new features:  https://www.youtube.com/watch?v=QK6PX4Fq5xY&list=UUThOLpqhLFFQN0nStdkyGLg

No comments:

Post a Comment