Sentiment Analysis Using VADER

Juveriya Mahreen 12 Oct, 2022 • 5 min read

This article was published as a part of the Data Science Blogathon.

Introduction

A business or a brand’s success depends solely on customer satisfaction. Suppose, if the customer does not like the product, you may have to work on the product to make it more efficient. So, for you to identify this, you will be required to analyze the sentiment of their opinions. Therefore, Sentiment analysis is basically defined as the process of identifying and categorizing opinions from a piece of text, thereby determining whether the opinion is positive or negative.

In this article, we will work on how to perform sentiment analysis using VADER. Sentiment analysis gives meaning to the text. Semantics helps us to interpret symbols, their types, and their relation with each other. Let us briefly understand what actually NLP is and also about NLTK Library.

VADER

What is NLP?

NLP is an automatic way of manipulating or processing human language. We use NLP to extract meaningful data from textual data. There are various applications of NLP such as Sentiment Analysis, Chatbot, Speech Recognition, Machine Translation, spell checking, Information Extraction, Keyword search, Advertisement matching, etc. Some real-world examples are Google Assistant and Google translate.

NLTK

Natural Language Toolkit (NLTK) is one of the most powerful NLP libraries which contains packages to make machines understand human language and reply to them in an appropriate desired response. NLTK has many built-in packages to process our textual data at every stage. There are various stages in nltk processing such as data cleaning, visualization, vectorization, etc.

Sentiment Analysis

Sentiment analysis is used to find out the polarity of the text, which is positive, negative, or neutral. It is one of the advanced research areas in natural language processing. This is widely used in data mining, text mining, etc. It helps collect and analyze opinions about a brand or a product by processing blog posts, comments, reviews, tweets, etc.

In sentiment analysis, we classify the polarity of a given text at the document, sentence, or feature level. It tells us about the opinion, whether it is positive, negative, or neutral.

Applications of Sentiment Analysis

 

VADER

 

  • Social media monitoring: As we all know, social media is taking over the world. More than 55% of customers share their reviews about purchases socially on many social networking sites. It’s almost difficult to analyze the reviews manually. Sentiment analysis lets us analyze and derive meaning from them.

  • Brand monitoring: Brand owners use sentiment analysis tools to keep track of the bad reviews about their brand. They can also use machine learning algorithms to predict outcomes based on the results derived using semantic analysis.

  • Voice of customer: Various sentiment analysis algorithms let us analyze the voice of the customers, such as the product that are most needed by the customers and also the products that are highly rated, etc. The brand owners can create a personalized customer experience based on these evaluations.

  • Customer service: Chatbots are a widespread way of delivering good customer service. Using sentiment analysis, you can transfer the chat to a customer service associate whenever needed. Also, you can automate the tasks such as booking a ticket, an appointment for a salon, etc.

  • Market research: Using sentiment analysis, you can research how well your competitors are growing and what are their positive feedbacks from the customers. You can also analyze the way they deal with their customers. You can, in turn, work on the issues related to your product’s failure.

  • Product Analysis: You can do keyword research to identify the products in demand and the highly rated products. You can also determine what features of a particular product are highly appreciated by the customers or the end users.

NLTK’s VADER module

VADER( Valence Aware Dictionary for Sentiment Reasoning) is an NLTK module that provides sentiment scores based on the words used. It is a rule-based sentiment analyzer in which the terms are generally labeled as per their semantic orientation as either positive or negative.

First, we will create a sentiment intensity analyzer to categorize our dataset. Then, we use the polarity scores method to determine the sentiment.

Practical Exercise

In this exercise, I will use a CSV file containing reviews for different products. The link for the file is :

https://drive.google.com/file/d/1NYdZoMJvBWuCejMX28pVRVfMyOe1GhnZ/view?usp=sharing

import numpy as np
import pandas as pd
import nltk
#download vader from nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
#creating an object of sentiment intensity analyzer
sia= SentimentIntensityAnalyzer()
#uploading csv file
from google.colab import files
uploaded = files.upload()
#reading csv file
df = pd.read_csv(io.BytesIO(uploaded['reviews.csv']))
df.head()

Polarity_scores: This function returns the sentiment strength based on the given input statement/text.

For example:

text= "Bobby is an amazing guy"
sia.polarity_scores(text)

{‘compound’: 0.5859, ‘neg’: 0.0, ‘neu’: 0.513, ‘pos’: 0.487}

You can observe that the above statement is neutral

text= "The food delivered was really very bad"
sia.polarity_scores(text)

{‘compound’: -0.6214, ‘neg’: 0.404, ‘neu’: 0.596, ‘pos’: 0.0}

This example statement is a negative one.

Let us now create a new column in our CSV file that stores the polarity scores of each review.

#creating new column scores using polarity scores function
df['scores']=df['body'].apply(lambda body: sia.polarity_scores(str(body)))
df.head()

Similarly, we then create three different columns each for compound scores, positive scores, and negative scores.

df['compound']=df['scores'].apply(lambda score_dict:score_dict['compound'])
df.head()
df['pos']=df['scores'].apply(lambda pos_dict:pos_dict['pos'])
df.head()
df['neg']=df['scores'].apply(lambda neg_dict:neg_dict['neg'])
df.head()

We then create a new column named type, which indicates whether the review is pos, neg, or neutral.

df['type']=''
df.loc[df.compound>0,'type']='POS'
df.loc[df.compound==0,'type']='NEUTRAL'
df.loc[df.compound<0,'type']='NEG'
df.head()

Finally, we loop through the rows and count the total number of positive, negative, and neutral reviews.

len=df.shape
(rows,cols)=len
pos=0
neg=0
neutral=0
for i in range(0,rows):
if df.loc[i][12]=="POS":
    pos=pos+1
if df.loc[i][12]=="NEG":
    neg=neg+1
if df.loc[i][12]=="NEUTRAL":
    neutral=neutral+1
print("Positive :"+str(pos) + "  Negative :" + str(neg) + "   Neutral :"+ str(neutral))

Positive :46060 Negative :13670 Neutral :8256

Therefore, using the VADER module, we concluded that our data has 46060 positive reviews, 13670 negative reviews, and 8256 neutral reviews.

Conclusion

Finally, as you all know, social media is taking over the world, and more than 55% of customers share their opinions or reviews about their purchases. Analyzing the semantics of the reviews would have given you a glimpse of how sentiment analysis is done using the concepts of NLP. As we have discussed in our article, there are many other applications of sentiment analysis beyond this.

In this article, 

  • We have briefly introduced you to sentiment analysis and its applications in the real world.
  • We then learned basically what Natural Language Processing is.
  • Finally, we used the NLTK module and the VADER analyzer to conduct sentiment analysis on amazon reviews. 
  • In short, NLTK is an open-source tool used for classifying the data, whereas VADER is a lexicon and rule-based tool of NLTK which is used to conduct sentiment analysis.

I hope this information helped you understand what sentiment analysis is and how it is done practically.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Juveriya Mahreen 12 Oct 2022

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Related Courses

Natural Language Processing
Become a full stack data scientist