Amazon Reviews Analysis Using Vader, RoBERTa, and NLTK

Amrutha K 12 Apr, 2023
13 min read


Do you know what computer’s favorite beat is? Its an algo – rhythm  🎶🎵🎶🎵. Yeah, it’s a joke that google assistant gave me when I asked her to tell a joke. Do you ever think about how these assistants work and how they can give us such replies? These models are built upon training using NLP. Transformers are the latest NLP models that help build an amazing future for NLP. Do you know something? Whenever you search for something on google, you use transformers. Nowadays, it is common thing for us to shop on Amazon based on the reviews given by the customers on the product.  In this project, we use some amazon reviews analysis using the amazing transformer models such as Vader, RoBERTa, and NLTK.

Learning Objectives

In this article, we will learn:

  • About NLP transformer and their architecture
  • Perform sentiment analysis on the Amazon reviews analysis using the food dataset
  • How to use NLTK and perform some tasks?
  • About tokenization, frequency distribution, pos tagging, and stopwords
  • To use Vader and Roberta models
  • To analyze wrongly classified texts

This article was published as a part of the Data Science Blogathon.

Table of Contents

What is Transformer?

Before transformers came into our lives, we used Recurrent Neural Networks (RNN) to deal with sequence data like text data. But there is a problem called vanishing gradients with the RNNs. It cannot remember long sequences. If you give a very sequence of words to the RNN, it is used to forget the starting of these words when it comes to the end of the sentence. Then LSTMs came into the picture. These are a little better than RNNs but cannot completely solve the problem. They can remember the information a little bit longer than the RNN. The problem with LSTM is they take a very long time to train.

Here come the transformers. They only depend on the attention mechanism to remember things. It trains faster as we can parallelize them. They can be trained concurrently for faster training. They have the capacity to understand the connection between sequential elements located far apart from one another. To define transformers,

It is a deep learning model that applies the attention mechanism method and gives each input data element its own weight in terms of relevance. It is mostly applied to computer vision and natural language processing (NLP) tasks. These models are formed by combining the benefits of CNNs and RNNs, making the model better and more accurate.

Source: nvidia

Attention Mechanism

The attention mechanism in the transformer architecture takes the place of recurrence and utilizes queries to choose the information (value) it needs based on the label provided by the keys. In predicting our output sequence, we use attention to concentrate on specific portions of our input sequence. Attention mechanisms differ according to the situation in which a given attention mechanism or model is used. In a wide range of sectors, including financial services, legal advice, education, logistics, and more, automation of computer vision and NLP applications using the attention mechanism has had a significant and long-term impact.

"Amazon reviews analysis

Transformer Architecture

Transformer is basically made up of an encoder and a decoder. It actually has 6 encoders and 6 decoders. But basically, the left-hand side is encoders, and the right-hand side is decoders. Each encoder has a self-attention layer that pays attention to itself and a feed-forward layer. Whereas every decoder has two self-attention layers and a feed-forward layer. First, the input is provided to the encoder, and all the words are passed through the self-attention layer. Here all the words are compared to all the other words, and there will be some processing, and then the output is passed through the feed-forward neural network.

All the inputs that are passed through either encoder or decoder are embedded. Embedding is a representation of words in vector form. And then, there is a positional encoding that tells the model about the particular word that appears in the sentence. Like the place, the word takes in the sentence. And finally, we have linear and softmax layers at the end.  And also, you can observe that there is Add & Norm at every layer. It basically normalizes the output. It is layer normalization. If you observe self-attention layers, there are two different layers. They are multi-head attention and masked multi-head attention. In the multi-head attention layer, all the words are compared to all the other words, but in the masked multi-head attention layer, words are compared to words that are already processed so far.

GPT, GPT, and RoBERTa are some of the transformer models. In this project, we will use the RoBERTa model and analyzes the text in the data frame. We will also use Vader GPT in this project.

Sentiment Analysis

Sentiment analysis is a technique for examining text data to determine its sentiment. The aim is to automatically recognize and categorize opinions stated in the text to calculate overall emotion. Sentiment analysis techniques categorize them as good, neutral, or negative. Using machine learning and text analytics, algorithms can categorize sentences into positive, negative, and neutral categories. Many companies, including Amazon and Twitter, use this sentiment analysis to analyze customer reviews on their products, and they will improve based on these results.


  • Sarcasm is the first and foremost challenge of sentiment analysis. Sometimes for negative amazon reviews analysis, people may give them in a positive way, making it sarcastic.
  • Sometimes text may contain negative words, but it doesn’t mean the intention is negative. It May might confuse some of those texts while classifying.
  • It is difficult to analyze reviews with a mixture of different languages. Companies have customers across the world. So it is common to have multilingual reviews.
  • Word ambiguity is another challenge to deal with. Some words in the text make it difficult to classify the text correctly.


Let’s Understand them in detail with a project.

Project Description

This project is all about Amazon reviews analysis using sentiment analysis given by Amazon customers on fine foods. We actually analyze the motion behind the text. To be more clear, we will analyze these texts and calculate sentiment scores for each of them. So we will use Vader and RoBERTa models to calculate polarity scores.

Problem Statement

This project’s primary objective is to conduct sentiment analysis, determine the polarity scores of the texts, and understand the sentiment hidden in each one.


  • Jupyter or Colab Notebook to run the code
  • Basic understanding of Python language
  • Basics of Natural Language Processing(NLP)


Dataset Description

The dataset that we are using now is the Amazon Reviews on fine foods dataset. You can download the dataset from here. These reviews include information about the product and the user, ratings given by customers, and a plain text review. This dataset contains,

  • A total of 568,454 reviews
  • Reviews from Oct 1999 – Oct 2012
  • Reviews from 256,059 users
  • There are 260 users with more than 50 reviews
  • On 74,258 products

Let’s get started with implementation.

First things first, we have to import all basic libraries that are necessary. Here we will import pandas, numpy, matplotlib, seaborn, and nltk.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns"ggplot")
import nltk

Import the dataset and view the first five rows of the dataset. Use pandas to import the dataset. Pass the path of the dataset to the read_csv method. The head() method will retrieve the first five rows of the dataset. Similarly, the tail() method is used to retrieve the last 5 rows of the dataset.


Let’s view the full text of the second comment.

"Amazon reviews analysis

View the shape of the data frame.

# output: (568454, 10)
# This data frame has 568454 rows and 10 columns.

Exploratory Data Analysis

Before you process further, for any project, you must understand the dataset completely, clean it, and convert it into the required format; this is done using Exploratory Data Analysis(EDA). Exploratory data analysis is a procedure used in data analytics to actually understand the data and discover its many aspects, usually using visuals.

There is a score column in the data frame, a rating the customer gives their product. It ranges from 1 to 5. Let’s have an analysis of how many amazon reviews there are for each rating.

"Amazon reviews analysis | Vader, RoBERTa, and NLTK.

Draw a barplot to analyze these ratings count.

#plotting bar graph
                                                title="Reviews by customers",
                                                color="indigo" )
#Adding X and Y labels
Vader, RoBERTa, and NLTK.
#Plotting Pie chart
                                                title="Reviews by customers")
Vader, RoBERTa, and NLTK.

We even have users with multiple reviews. To analyze how many reviews each user gave, we have to count them for each user. The top customer with the highest number of reviews is ‘A30XHLG6DIBRW8’, with 448 amazon reviews analysis.

"Amazon reviews analysis |Vader, RoBERTa, and NLTK.

Let’s check if there are any null values in our dataset.

"Amazon reviews analysis

We have null values in ProfileName and Summary columns. We have 16 null ProfileNames and 27 null summaries. For some projects, it is necessary to deal with these null values. For our project, it doesn’t matter, so we will move further without any changes.

Word Cloud

A collection of words presented in different sizes is called a word cloud. A word is more frequently used and considered to be of more importance when it is larger and bolded inside a text.

Let’s plot the word cloud for a text in our dataset.

text = df.Text[0]

# Create a word cloud image:
wordcloud = WordCloud().generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
"word cloud | Vader, RoBERTa, and NLTK.
text = df.Text[1]

# Create a word cloud image:
wordcloud = WordCloud().generate(text)
plt.imshow(wordcloud, interpolation='bilinear')
"Amazon reviews analysis | Vader, RoBERTa, and NLTK.

We have about half a million amazon reviews analysis in the dataset. We can actually use the entire dataset for analysis. But for simplicity, I am taking the first thousand rows in the dataset and will continue further.

# Now the data frame has only the first 1000 rows.


The goal of the area of natural language processing (NLP) is to enable computer programs to understand and utilize natural human language. An NLP Python package is called NLTK, or Natural Language Toolkit. With NLTK, a range of activities can be carried out, including tokenizing, stemming, parts of speech tagging, etc. NLTK helps the computer with text analysis, preprocessing, and comprehension.

Let’s take a text sentence from the data.

sentence= df['Text'][1]
"Vader, RoBERTa, and NLTK.


The initial stage in NLTK text analytics is tokenization. It is the process of breaking down a paragraph into simpler components. A token is a single item that forms the basis of a phrase or paragraph. There are two tokenizers in NLTK: A sentence tokenizer, and the other is a word tokenizer. We will use a word tokenizer to analyze our text.


#To view tokens

Frequency Distribution

Counting how often a word appears across a text sequence is a regular task during text processing. This frequency distribution in NLTK helps in counting words.

#Printing Frequency Distribution
from nltk.probability import FreqDist
freq_dist = FreqDist(tokens)
"Amazon reviews analysis

In the given nltk input sentence, we have 31 samples and 37 outcomes. The top 3 common words with high frequency are ‘the’ with 3, ‘as’ with 2, and ‘Jumbo ‘with 2.


Stopwords are regarded as textual noise. Text may include stopping words like “is,” “am,” “are,” “this,” “a,” “an,” “the,” etc. In NLTK, you must build a list of stopwords and filter your list of tokens from this list in order to remove stopwords. For some projects, it is mandatory to remove stopwords before processing further. But for our project, it is not that necessary to remove them. So we will continue as before. Run the below code to get all the stopwords in English.

from nltk.corpus import stopwords


POS Tagging

Determining the grammatical group a given the word belongs to is the main goal of Part-of-Speech (POS) tagging. According to the situation, it will determine whether it is a NOUN, PRONOUN, VERB, ADVERBS, etc. In order to tag a word, POS Tagging searches for relationships inside the phrase. Refer to this to know more about these abbreviations.


Vader Sentiment Analysis

Text sentiment analysis is carried out using the VADER (Valence Aware Dictionary for Sentiment Reasoning) model, which is sensitive to both the polarity (positive/negative) and intensity (strength) of emotion. In addition to reporting on positivity and negativity scores, VADER also provides information about the sentiment of a statement. One can calculate the sentiment score of a text by multiplying the intensity of each word in the text. Vader is very intelligent in knowing positive and negative sentences based on the words in the sentence.

The Compound score in Vader is a measurement that adds together all lexical ratings that have been scaled between -1 (the most extreme negative) and +1. (most extreme positive).

from nltk.sentiment import SentimentIntensityAnalyzer
from tqdm.notebook import tqdm
"Amazon reviews analysis

SentimentIntensityAnalyzer() of VADER analyses a string and produces a dictionary of scores in four categories: Negative, Neutral, Positive, and Compound, obtained by normalizing the remaining three scores.

In the first example, It is more likely a positive sentence, giving a positive compound. In the second example, It’s purely a negative sentence. The compound is also negative. Finally, In the third example, we have passed the sentence we have used.

Vader, RoBERTa, and NLTK.

Here the customer is complaining that the product was labeled as large-sized, but the actual product is small-sized. This is probably a negative sentence. As expected, our Vader model also gave a negative compound, saying it’s a negative sentence.

#create empty dictionary to store results
for i,row in tqdm(df.iterrows(),total=len(df)):
Vaders= Vaders.reset_index().rename(columns={'index': 'Id'})

We created polarity scores for all the texts in the data frame and got negative, neutral, positive, and compound scores for each. Then we merged this data into the original data frame and created the Vader data frame.

RoBERTa Model

RoBERTa is a transformers model that was self-supervised and pre-trained on a huge corpus of English data. This indicates that it was just pre-trained on the raw texts, without any human labeling, with an automatic procedure that uses the texts to produce inputs and labels. RoBERTa and BERT differ significantly from each other in that RoBERTa was learned using a larger dataset and a more efficient training method. RoBERTa was specifically trained on a dataset of 160GB of text, which is more than 10 times bigger than the dataset used to train BERT. Roberta analyses a string and produces a dictionary of scores in three categories: Negative, Neutral, and Positive.

Initially install transformers and then import AutoTokenizer, AutoModelForSequenceClassification, and softmax. Then we loaded the model and created a tokenizer.

pip install transformers
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax

#Load the pre-trained model
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

Similar to polarity scores, we have Roberta’s polarity scores. We will also create it for each text and add it to the data frame.

def roberta_polarity_scores(sentence):
    encoded_text = tokenizer(sentence, return_tensors='pt')
    output = model(**encoded_text)
    scores = output[0][0].detach().numpy()
    scores = softmax(scores)
    scores_dict = {
        'roberta_neg' : scores[0],
        'roberta_neu' : scores[1],
        'roberta_pos' : scores[2]
    return scores_dict
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
        text = row['Text']
        myid = row['Id']
        vader_result = SIA.polarity_scores(text)
        vader_result_rename = {}
        for key, value in vader_result.items():
            vader_result_rename[f"vader_{key}"] = value
        roberta_result = roberta_polarity_scores(text)
        both = {**vader_result_rename, **roberta_result}
        res[myid] = both
    except RuntimeError:
        print(f'Broke at id {myid}')

results_df = pd.DataFrame(res).T
results_df = results_df.reset_index().rename(columns={'index': 'Id'})
results_df = results_df.merge(df, how='left')

Let’s View the final data frame and columns in the dataset.


# To view columns

Columns output: Index([‘Id’, ‘vader_neg’, ‘vader_neu’, ‘vader_pos’, ‘vader_compound’, ‘roberta_neg’, ‘roberta_neu’, ‘roberta_pos’, ‘ProductId’, ‘UserId’, ‘ProfileName’, ‘HelpfulnessNumerator’, ‘HelpfulnessDenominator’, ‘Score’, ‘Time’, ‘Summary’, ‘Text’], dtype=’object’)

Analyzing Wrongly Classified Texts

Even though most of the texts are classified correctly, there will still be some ambiguous, wrongly classified sentences. Sometimes they may sound positive, but in actuality, they are negative. Similarly, positive sentences sometimes sound like negative ones. Now we will see a few texts that our models wrongly classify.

Vader, RoBERTa, and NLTK.

In the first example, we took a text that was classified as positive by Roberta, but the customer rating is 1.


This sounds more like a positive sentence with positive words like LOVE. But if we go into details, it’s a negative text complaining about the plastic found in food. So Roberta classified it wrong.

In the second example, we took a text that was classified as positive by Vader, but the customer rating is 1.

"Amazon reviews analysis

It’s a negative one, but the customer might sarcastically give a review mentioning it as a positive note. So model analyzed it as positive text.

The last two examples are the same. Both Vader and Roberta models classified it as negative text but rated it 5.

Vader, RoBERTa, and NLTK.

Here the customer actually loved the food but gave the review complaining about weight gain. So models are classified as negative.


The invention of transformers made it possible for researchers to vectorize each word and define how it links to other concepts. Words can now be described using a variety of dimensions that show how closely related they are to the meanings and usage of other words. The use of transformers made it simpler than ever to model the link between words. Many applications of transformers are used in Virtual assistants, marketing, analyzing medical records, and many more.

  • A vast amount of unstructured review texts are analyzed using sentiment analysis in order to extract people’s thoughts and categorize them into sentiment classes.
  • Sentiment analysis is the result of the fusion of machine learning technology and human emotional interpretation.
  • Transformers have been employed in creating stupid joke-telling chatbots, realistic news stories, and even better Google Search results.
  • You can build your own google assistant to tell many more jokes, sing songs, and learn these transformer models.
  • This guide taught us many things, like transformers, NLTK, NLP, Sentiment analysis, Vader, and RoBERTa models.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Amrutha K 12 Apr, 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers