Amazon Product review Sentiment Analysis using BERT

Yash Inaniya 19 Apr, 2023 • 5 min read

This article was published as a part of the Data Science Blogathon


Natural Language processing, a sub-field of machine learning has gained immense popularity in the last 5 years in both research and industrial applications due to the advancement in the field of deep learning and improvement in the computational power of hardware systems. It is a technique for computers to understand how human languages work involving the usage of computational linguistics and the computer science domain. In recent years, NLP has found its usage in several applications related to understanding and interpreting text, audio, and video files.

Sentiment Analysis

One of the key areas where NLP has been predominantly used is Sentiment analysis. The understanding of customer behavior and needs on a company’s products and services is vital for organizations. Generally, the feedback provided by a customer on a product can be categorized into Positive, Negative, and Neutral. Interpreting customer feedback through product reviews helps companies evaluate how satisfied the customers are with their products/services.

 Sentiment Analysis using BERT

Source: MonkeyLearn

BERT and Tensorflow

BERT(bi-directional Encoder Representation of Transformers) is a machine learning technique developed by Google based on the Transformers mechanism. In our sentiment analysis application, our model is trained on a pre-trained BERT model. BERT models have replaced the conventional RNN based LSTM networks which suffered from information loss in large sequential text. They can easily understand the context of a word in a sentence based on previous words in the sentences due to its bi-directional approach. The advancements in the hardware made it possible for models like BERT to be built as required

Tensorflow is a popular and widely used machine learning framework for developing deep learning applications. The APIs provided by Tensorflow make it seemingly easy to train and build a deep neural network.

In this article, we are going to see how BERT can be used for developing a sentiment analysis application.


The complete implementation of the model is done in google colab notebook. If you would like to directly head over to the code, then go here.

The implementation of the model can be divided into few phases:

Installing Libraries and dependencies

We will first install all the necessary libraries and packages. This involves installing Transformers if you haven’t done it already. Please note that our model will require the use of GPU, therefore it is recommended to change the runtime type to GPU before you start running the model in the notebook. To ensure that the GPU is enabled, we can use the Tensorflow API ‘tf.config’.

import tensorflow as tf
num_gpus_available = len(tf.config.experimental.list_physical_devices('GPU'))
print("Num GPUs Available: ", num_gpus_available)
assert num_gpus_available > 0
!pip install transformers
from transformers import DistilBertTokenizerFast
from transformers import TFDistilBertForSequenceClassification
import pandas as pd
import numpy as np

Loading Dataset and Pre-processing

The dataset that we are using is the amazon mobile/electronics product reviews which is a subset of the large Amazon Product review. The dataset is already stored in the TensorFlow database and can be loaded directly using the ‘tfds‘ API from Tensorflow. Once loaded you’ll have to convert it into a pandas data frame using ‘tfds.as_dataframe‘ API.

import tensorflow_datasets as tfds
ds = tfds.load('amazon_us_reviews/Mobile_Electronics_v1_00', split='train', shuffle_files=True)
assert isinstance(ds,
#convert the dataset into a pandas dataframe
df = tfds.as_dataframe(ds)

The dataset consists of several columns ranging from Product ID to reviews, heading, and star rating provided by the customer. As we are only interested in the reviews and the corresponding rating provided by the customer, we are going to drop the other feature columns.

The rating provided by the customer is on a scale of 1-5( 5 being the highest). As we are going to implement a binary classification model, we will need to convert these ratings into 2 categories,i.e 1 and 0. Ratings above and equal to 3 will be labeled as Positive(1) and below 3 will be negative(0). The following code will help us implement these steps.

df["Sentiment"] = df["data/star_rating"].apply(lambda score: "positive" if score >= 3 else "negative")
df['Sentiment'] = df['Sentiment'].map({'positive':1, 'negative':0})
df['short_review'] =df['data/review_body'].str.decode("utf-8")
df = df[["short_review", "Sentiment"]]

In our dataset, we still have a large corpus of reviews provided by the customer. Since BERT requires high computational power and takes a large amount of time to train on the data frame, we will drop some rows from our dataset to reduce the time for training on it.

# Dropping last n rows using drop
n = 54975
        inplace = True)

Great, our dataset is now in perfect shape for building the model. We can look at a few of the elements in our data frame using df.head() to show the first few rows.

0 Does not work 0
1 This is a great wiring kit i used it to set up.. 1
2 It works great so much faster than USB charger… 1
3 This product was purchased to hold a monitor o… 1

Tokenization of text and conversion into tokens

Before we start building our model, we will need to convert our review column into numerical values as machine learning models operate on numerical features. There are different ways and techniques through which you can do the text vectorization process. Some of the popular methods are Bag of words, TFIDF, Tokenizer from Keras, Word Embedding. In our application, we are going to use the Tokenizer class from pre-trained DistilBert.

The tokenizer will convert each word in a sentence into integer tokens/IDs based on the frequency of the word appearing in the corpus.

We will convert our feature column and label into a set of lists as that’s how our Tokenizer wants our data. To apply our Tokenizer to the corpus, we will split the dataset into training and testing sets as the tokenizer only needs to be fit on the training set and not on the test set.

reviews = df['short_review'].values.tolist()
labels = df['Sentiment'].tolist()

To split the data into training and validation sets, we will make use of Train-test-split class from Scikit-Learn.

from sklearn.model_selection import train_test_split
training_sentences, validation_sentences, training_labels, validation_labels = train_test_split(reviews, labels, test_size=.2)

#Assign tokenizer object to the tokenizer class
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
tokenizer([training_sentences[0]], truncation=True,

                            padding=True, max_length=128)

The Tensorflow API provides a seemingly easy way to build data pipelines. Using “from-Tensor-Slices“, we can easily combine our features tokens and labels into a dataset.

train_encodings = tokenizer(training_sentences,
val_encodings = tokenizer(validation_sentences,
train_dataset =
val_dataset =

Model Training and optimization

In our application, we are going to use TFDistilBertForSequenceClassification for the sentiment analysis and put the ‘num-labels’ parameter equal to 2 as we are doing a binary classification.

model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased',num_labels=2)

We don’t need to put any additional layers and using a Hugging face transformer, we can now train our model with the following configuration:

  • epochs: 2
  • Batch size: 16
  • Learning rate (Adam): 5e-5 (0.00005)

The number of epochs can be increased, however, it will give rise to overfitting problems as well as take more time for the model to train. The complete model gets trained in around 2hrs, that’s why it is important to keep the number of epochs and batch size low.

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5, epsilon=1e-08)
model.compile(optimizer=optimizer, loss=model.compute_loss, metrics=['accuracy']),

In just 2 epochs, our models give 94.73% and 92% accuracy on the Training and validation set respectively. Once we have trained our model, we can now save it and load it later to evaluate the model on unseen data. This can be done using ‘’ API

loaded_model = TFDistilBertForSequenceClassification.from_pretrained("./sentiment")


To evaluate our model accuracy on unseen data, we can load the saved data and test it on new sentences and see if the sentiment is predicted correctly or not.

test_sentence = "This is a really good product. I love it"
predict_input = tokenizer.encode(test_sentence,
tf_output = loaded_model.predict(predict_input)[0]
tf_prediction = tf.nn.softmax(tf_output, axis=1)
labels = ['Negative','Positive']
label = tf.argmax(tf_prediction, axis=1)
label = label.numpy()

This returns ‘Positive’. I have tried several other types of sentences and our model performs fairly well in predicting the sentiment of the sentences.


BERT models perform fairly well in comparison to RNN models. However, they require high computational power and a large time to train on a model. Thus, unless the dataset is complex and the application requires high accuracy, we can also use simpler models as they are faster to train with less computational power requirements and give fairly efficient results.

The media shown in this article on Sentiment Analysis using BERT are not owned by Analytics Vidhya and are used at the Author’s discretion.
Yash Inaniya 19 Apr 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers