Predicting the Toxicity of Comments Using Text Classification

Sanket Barhate 12 Jul, 2022 • 6 min read

This article was published as a part of the Data Science Blogathon.


We all check our email every day, possibly more than once. The majority of email service providers have the useful feature of automatically separating spam emails from other emails. This is an example of a common NLP problem called text classification.

Text is unstructured and getting insights from it can be difficult and time-consuming. But categorizing text data is becoming simpler as a result of developments in machine learning and natural language processing.

In this article, we will understand more about text classification and create a model to classify comments into toxic and non-toxic.

What is Text Classification?

Classification in machine learning refers to the problem of identifying a data item into one or more predefined classes. The data point can be in a variety of formats, including text, numerical, audio or image. Text classification is a specific case of the classification problem in which text is used as the input data point and the objective is to classify the text into one or more predefined classes.

Depending on the number of categories included, every supervised classification method can be further divided into three categories: binary, multiclass, and multilabel classification. Binary classification is used when the number of classes is two. Multiclass classification is the term used when there are more than two classes. A document may have one or more labels or classes associated with it when using multilabel classification

In the email spam identifier, we have two categories spam and non-spam. Every email is pre-processed and passed through a classifier which categorizes the email into spam and non-spam.

Applications of Text Classification

  • Classification of textual data includes content organization, search engines, recommendation systems etc.
  • Fake news classification is another example of text classification.
  • Customer Support: Sometimes brands require to respond to messages received in the form of tweets or emails. A customer may be voicing a complaint or a desire to buy a product. Thus it is necessary to identify the intent of these messages.
  • Sentiment Analysis of reviews on e-commerce websites to understand customers’ perception of a product based on their comments.

How to do Text Classification?

On social media, people are free to express themselves. There are numerous online forums these days where users actively participate and post a comment. However, occasionally if someone uses abusive language, it becomes essential to filter these comments. To address this issue, we will develop a model that determines whether or not a comment is toxic.

We will use the toxic comments dataset, which comprises a large number of Wikipedia comments labelled as toxic by human raters. There are six types of toxicities in this data: toxic, severe-toxic, obscene, threat, insult and identity-hate. A comment may fall under more than one category. As a result, it becomes a multilabel classification problem.

Import the necessary libraries

import pandas as pd

import numpy as np

from nltk.corpus import stopwords

import re

import string

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score, classification_report, roc_auc_score ,confusion_matrix

from nltk.stem import WordNetLemmatizer, PorterStemmer

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

from sklearn.naive_bayes import MultinomialNB, GaussianNB


import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline'ggplot')

plt.rcParams['figure.figsize'] = (15, 10)

Load the dataset

df = pd.read_csv(“train.csv”)
Text Classification

We can find the number of comments in each category in the dataset using the following code.
Python Code:

Data Cleaning

We will clean the text by removing the punctuations and converting all the abbreviations into their full forms.

def  clean_text(text):
    text =  text.lower()
    text = re.sub(r"i'm", "i am", text)
    text = re.sub(r"r", "", text)
    text = re.sub(r"he's", "he is", text)
    text = re.sub(r"she's", "she is", text)
    text = re.sub(r"it's", "it is", text)
    text = re.sub(r"that's", "that is", text)
    text = re.sub(r"what's", "that is", text)
    text = re.sub(r"where's", "where is", text)
    text = re.sub(r"how's", "how is", text)
    text = re.sub(r"'ll", " will", text)
    text = re.sub(r"'ve", " have", text)
    text = re.sub(r"'re", " are", text)
    text = re.sub(r"'d", " would", text)
    text = re.sub(r"'re", " are", text)
    text = re.sub(r"won't", "will not", text)
    text = re.sub(r"can't", "cannot", text)
    text = re.sub(r"n't", " not", text)
    text = re.sub(r"n'", "ng", text)
    text = re.sub(r"'bout", "about", text)
    text = re.sub(r"'til", "until", text)
    text = re.sub(r"[-()"#/@;:{}`+=~|.!?,]", "", text)
    text = text.translate(str.maketrans('', '', string.punctuation)) 
    text = re.sub("(\W)"," ",text) 
    text = re.sub('S*dS*s*','', text)
    return text
df["text"] = df['comment_text'].apply(lambda text: clean_text(text)

We can compare the cleaned text and the original comment using:

comment_text clean_text
0 ExplanationnWhy the edits made under my usern.. explanation why the edits made under my userna…
1 D’aww! He matches this background colour I’m s… daww he matches this background colour i am se…
Train Test Split
cols_target = ['toxic','severe_toxic','obscene','threat','insult','identity_hate']
X_train, X_test, y_train, y_test = train_test_split(df['text'], df[cols_target], test_size= 0.3)

We can check the dimensions using:


(95742, 6)
(63829, 6)

Vectorise the text

We will use TfidfVectorizer to create vectors of the textual data.

vect = TfidfVectorizer(
    ngram_range=(1, 3),         
X_train = vect.fit_transform(X_train)
X_test = vect.transform(X_test)

Use the OneVsRestClassifier

As we know this is a multilabel classification problem and each comment may belong to one or more categories.  The multilabel classification can be transformed into binary classification by building a distinct model for each category. This can be accomplished by training six separate models while iterating over the categories in a loop. One simple method to accomplish this is to use OneVsRestClassifier, which automatically fits one classifier per class. To build the model, we will wrap a Multinomial Naive Bayes Classifier in OneVsRestClassifier.

model = OneVsRestClassifier(MultinomialNB()),y_train)
y_pred = model.predict(X_test)

Evaluate the model

To evaluate the model we will create a confusion matrix for each category.

cfs = []
for i in range(6):
    cf = np.asarray(confusion_matrix(y_test[cols_target[i]], predicted_y_test[:,i]))
def print_confusion_matrix(confusion_matrix, axes, class_label, class_names,c,fontsize=14):

    df_cm = pd.DataFrame(

        confusion_matrix, index=class_names, columns=class_names,


heatmap = sns.heatmap(df_cm, annot=True,cmap=c, fmt="d", cbar=False, ax=axes)

    heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right', fontsize=fontsize)

    heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right', fontsize=fontsize)

    axes.set_ylabel('True label')

    axes.set_xlabel('Predicted label')

    axes.set_title("Confusion Matrix for the class - " + class_label)
fig, ax = plt.subplots(3, 2, figsize=(12, 7))

cmaps = ['Accent','Greens','Pastel1','Wistia','Pastel2','plasma']    

for axes, cfs_matrix, label, i in zip(ax.flatten(), cfs, cols_target,range(6)):

    c = cmaps[i]

    print_confusion_matrix(cfs_matrix, axes, label,["N", "Y"],c,14)

Text Classification

A classification report can be obtained using:

cr = pd.DataFrame(classification_report(y_test,y_pred, target_names=cols_target,output_dict=True)).T
cr['support'] ='Pastel1')

Use the model

Now that we have built and evaluated our model, which performs pretty well without any tuning. We will use our model to predict the toxicity of a user-defined comment.

def make_test_predictions(df,classifier):

    df.comment_text = df.comment_text.apply(clean_text)

    X_test = df.comment_text

    X_test_transformed = vect.transform(X_test)

    y_test_pred = classifier.predict_proba(X_test_transformed)

    a = np.array(y_test_pred[0])

    sns.barplot(x = cols_target,y =a*100)


    result =  sum(y_test_pred[0])

    if result >= 1:

       plt.title('The comment is Toxic')

    else :

      plt.title('The comment is Non Toxic')
#Enter the comment
comment_text = "how can you say that stupid"

comment ={'id':[1],'comment_text':[comment_text]}

comment = pd.DataFrame(comment)

comment_text = "You are a good musician"

comment ={'id':[1],'comment_text':[comment_text]}

comment = pd.DataFrame(comment)


The graphs display the probability percentage to which a comment may belong. When the probabilities are added up, if the total is less than 1, the comment is considered non-toxic because it does not fit into any of the categories; otherwise, the comment is toxic.


In this article, we discussed text classification and understood how text classification can be quite helpful in extracting meaningful insights from text data. We developed a multi-label classifier and applied our trained model to predict the toxic comments.

 I hope that after reading this, you are prepared to solve text classification problems for your use case and scenario, understand how to leverage the current solution, and create your own classifiers.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Sanket Barhate 12 Jul 2022

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


embroidered patches
embroidered patches 06 Dec, 2022

I love ?t whenever people c?me together and share thoughts.

  • [tta_listen_btn class="listen"]