Sentiment Analysis with TextBlob and Vader
It was 6:30 AM on a Thursday, I got a call from my senior regarding a project on sentiment analysis. Honestly, I was dreaming and I was very much sleepy, Thanks to the call recorder that helped me listen to the requirements and deadline once again. So the conclusion was I have to analyze review data and classify them into Negative, Positive, and Neutral and I had to deliver it the next Monday. I have a good interest in NLP and Sentiment analysis is a major part of it. I have heard of Lexicon-based sentiment analyzers, particularly TextBlob, and I chose to go with it because there was no time to experiment with some new algorithm. TextBlob is developed by Steven Loria, it’s a python library that
uses Natural Language Toolkit to perform the tasks. I have seen many projects that use TextBlob as a sentiment analyzer, mostly with Twitter data or Movie review data.
Why sentiment analysis is important?
A review on anything(product, movie, person, etc.) is very important these days to get a clear picture of what the end-users think. An enthusiast gives all kinds of reviews on a particular domain or a consumer who has a bad experience provides negative reviews on the product they purchased/used. It’s us who gave “The Shawshank Redemption” a rating of 9.3 and “Student of the Year 2” has a rating of 2.2 on IMDB. Talking about any product’s feedback, We have plenty of user review data if we want to take a look at this particular topic. The sources can be shopping portals like Amazon, Flipkart, Alibaba, Myntra, etc. as well as social media platforms like Twitter, Facebook,
The domain in which I was about to work was a little different from movie reviews, It was mobile devices review from various shopping portals and social media. The data extraction part was traditional i.e. to scrap it from the portals. I used web scraping to get all the data. Now it was time to perform the data cleaning and sentiment analysis. Cleaning the data was a time-consuming task as the review contained sentences with irrelevant things that were not useful for my objective. For data cleaning, I used regular expressions and a few ready-made
python libraries. The corpus(data) was now ready to be analyzed.
Begin with TextBlob
Let’s see some basics of TextBlob, It uses NLTK (Natural Language ToolKit) and the input contains a single sentence, The output of TextBlob is polarity and subjectivity. Polarity score lies between (-1 to 1) where -1 identifies the most negative words such as ‘disgusting’, ‘awful’, ‘pathetic’, and 1 identifies the most positive words like ‘excellent’, ‘best’. Subjectivity score lies between (0 and 1), It shows the amount of personal opinion, If a sentence has high subjectivity i.e. close to 1, It resembles that the text contains more personal opinion than factual information. I was more concerned about the Polarity score as my objective was not to identify factual information, so I skipped the subjectivity score in my project.
To start working with TextBlob it requires preinstalled python, and configured pip. The pip installation command for TextBlob is
pip install textblob
To import TextBlob we need to write as
from textblob import TextBlob
TextBlob syntax to get polarity score:
res = TextBlob(sentence)
As TextBlob is a Lexicon-based sentiment analyzer It has some predefined rules or we can say word and weight dictionary, where it has some scores that help to calculate a sentence’s polarity. That’s why the Lexicon-based sentiment analyzers are also called “Rule-based sentiment analyzers”.
Let’s check some random sentences’ polarity with TextBlob, The beauty of TextBlob is it has a very easy syntax.
- It’s a beautiful day.
- This movie is badly directed.
- The weather today is pleasant.
We get the polarity values as 0.85, -0.69, 0.73 respectively. In the above data, we have a negative sentence “This movie is badly directed” which has a polarity score of -0.69 which resembles one of the most negative sentences,
Let’s change the word “badly” to “amazingly”.
res = TextBlob("This movie is amazingly directed")
The output comes as 0.6000000000000001. Here, TextBlob works amazingly as a sentiment analyzer. And I was successful in delivering my project next Monday and got appreciation as well from my colleagues.
The next day I was just looking at the result files and some particular sentence caught my attention.
It was “no slow-motion camera”
As I told that my domain was mobile phone review analysis so if anyone writes this sentence it’s a negative one, but TextBlob classified it as positive with a polarity score of 0.15. That made me curious and forced me to do some more exploration on how TextBlob works and the finding was when any negation is added with any sentence it simply multiplies -0.5 to the polarity score of the word. In my case, it was the word “slow” which was a negative word and have a polarity score of -0.3 so when it multiplies -0.5 then the resulting polarity of the sentence becomes positive 0.15.
Another issue I faced with TextBlob was when the negation word is added somewhere in between i.e. not adjacent to the word which has some polarity other than 0.
- This is the best Face Recognition at this price. (Polarity: 1.0)
- This is not the best Face Recognition at this price. (Polarity: 1.0)
In the above example if we see the word “best”, it has a polarity score of 1.0 however in the second sentence it should multiply -0.5 to 1.0 and the value should appear as -0.5 but this is not the case. The answer here is TextBlob considers “not best” differently from “not the best” and that creates the issue. These things need to be changed as it was impacting the overall sentiment on the product.
The Game Changer
I started exploring again on Sentiment analyzers and found a research paper written by Eric Gilbert and C. Hutto. It was on VADER (Valence Aware Dictionary and Sentiment Reasoner). VADER is another Lexicon-based sentiment analyzer that has pre-defined rules for words or lexicons. VADER not only tells the lexicon is positive, negative, or neutral, it also tells how positive, negative, or neutral a sentence is. The output from VADER comes in a Python dictionary in which we have four keys and their corresponding values. ‘neg’, ‘neu’, ‘pos’, and ‘compound’ which stands for Negative, Neutral, and Positive respectively. The Compound score is an indispensable score that is calculated by normalizing the other 3 scores (neg, neu, pos) between -1 and +1. The decision criteria are similar to TextBlob -1 is for most negative and +1 is for most positive.
It works differently than TextBlob. I took some of the problematic sentences and executed them with VADER and the output was correct.
To start working on VADER we need to install it with pip.
pip install vaderSentiment
We need to import and initialize it as
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
I checked with my problematic sentence:
print(sid_obj.polarity_scores("no slow motion camera"))
I analyzed the whole corpus with Vader and TextBlob. The output brought me to the conclusion that TextBlob was struggling with negative sentences, particularly negations.
The above graph is a scatter plot of the Pearson correlation coefficient between the mentioned two algorithms VADER and TextBlob. In the above graph, we can see that the sentences considered Negative by VADER were mostly identified as positive by TextBlob. In the case of the 1st, & 3rd quadrant both the algorithms have reached an agreement. But in the case of the 2nd & 4th quadrant, there is a mismatch especially for the 4th quadrant, which has more contradictory data, it belongs to Positive as per TextBlob and Negative as per VADER.
To get rid of the bias I had for TextBlob I needed more proof to get convinced that VADER is doing the job better than TextBlob in my project. To get the proof I needed more experiments.
As said by Richard Feynman “It doesn’t matter how beautiful
your theory is, it doesn’t matter how smart you are. If it doesn’t agree with
experiment, it’s wrong.”
The best way was to compare the two algorithms I had, but the major problem was to compare with what? I wanted some real analysis, but who will decide a correct sentiment? It’s us, Humans. Initially, I thought I will mark all the correct sentiments, but I researched a little and came to know about “Wisdom of Crowd”. In the book “The Wisdom of Crowds”, James Surowiecki has written, “collective
knowledge of a group of people as expressed through their aggregated opinions
can be trusted as an alternative to an expert’s knowledge”. I decided to go
with Wisdom of Crowd to get the correct sentiment.
I selected 20 people for this task, of which 10 had expertise in the mobile domain while the rest did not. I gave them 150 random sentences to mark as Positive, Negative, and Neutral. Then from the output provided by each individual, I took the average of 20 people and gave a final correct sentiment rating. That was the gold standard. Now we can compare TextBlob and Vader. To get the accuracy of an algorithm as compared to human analyzed sentences, I created confusion matrices with both the algorithm versus crowdsourcing data.
The result is very convincing that VADER outperforms TextBlob when it comes to negative polarity
detection. In the above-mentioned confusion matrices VADER gets an overall accuracy of 63.3% however TextBlob gets an accuracy of 41.3%.
Can we say VADER is better than TextBlob for Sentiment Analysis?
It depends on the requirement of the user. My answer is No, VADER is not better than TextBlob in all areas. However, I can say that
VADER works better when it comes to negative sentiment classification.
In the above-mentioned table the f1 score of VADER is 0.80 when it comes to negative polarity detection and for TextBlob it comes as 0.56. From this, we can conclude that VADER does better sentiment analysis when it comes to negative polarity detection.