Shivani Sharma — July 30, 2021
Advanced Classification NLP Project Python Structured Data Supervised

This article was published as a part of the Data Science Blogathon

What each big tech company wants is the Security and Safety of its customers. By detecting spam alerts in emails and messages, they want to secure their network and enhance the trust of their customers. The official messaging app of Apple and the official chatting app of Google i.e Gmail is unbeatable examples of such applications where the process of spam detection and filtering works well to protect users against spam alerts. So, if you’re looking to create a spam detection system, this text is for you.

sourceSpam detection

What is the so-called Spam?

E- messages are a crucial means of communication between many people worldwide. But several people and corporations misuse this facility to distribute unsolicited bulk messages that are commonly called Spam SMS. Spam SMS may include advertisements of medicine, software, adult content, insurance, or other fraudulent advertisements. Various Spam filters are wont to provide a protective mechanism that will design a system to acknowledge spam.

Spam Detection

After submitting your personal details like mobile number or email address on any platform, they started the advertisement of their unusual products by constantly pinging you. They try to advertise by sending constant emails and with the help of your contact details they keep sending you messages as well they are doing WhatsApp more nowadays. Hence, the output is nothing but a lot of spam alerts and notifications popping up in your inbox. This is often where the task of spam detection comes in.

Spam detection means detecting spam messages or emails by understanding text content in order that you’ll only receive notifications regarding your messages or emails that are crucial to you. If spam messages are found, they’re automatically transferred to a spam folder and you’re never notified of such alerts. This helps to enhance the user experience, as many spam alerts can bother many users.

What is Spam filtering?

Could you guess when you become the target of hackers? Yes, if you are thinking about Spam then you are on the right path. Whenever spam hits your email or message inbox, you are in hands of hackers and they’ll call you their target. When it involves technology, humans tend to be the weakest link in most IT security situations. Attackers will constantly attempt to trick them, manipulating users to click on things that they shouldn’t through a spread of methods. Oftentimes, these “tricks” are via email, as email platforms can target a really sizable amount of individuals and maybe a very “budget-friendly” attack. After clicking the inappropriate thing available in the spam emails, you made your important and personal data exposed to the hackers. The role of Spam filtering comes into existence as email is usually used as how to take advantage of users and their most powerful data. Organizations must utilize a spam filter to scale back the danger of users clicking on something they shouldn’t, successively keeping their internal data shielded from a cyber attack.

WHY IT’S IMPORTANT?

The Implementation of spam filtering is exclusively important for all organizations. The major role of spam filtering is to keep the garbage out of email boxes. You can also treat spam filtering as a buddy who manages your life smoothly by showing only the safe and desired mails. Spam filtering is actually used as an anti-malware tool because the only trick of hackers is to share the attachments on mail and ask for your credentials. Another non-neglecting aspect is the removal of Graymail. Graymail is an email that a user has previously opted to receive, but doesn’t actually need or need in their inbox. Graymail isn’t considered spam, as these emails aren’t wont to infiltrate a corporation. what’s considered graymail is decided by the actions of the user over time, and spam filtering platforms will devour thereon to work out what’s or isn’t wanted within an inbox.

Spam Detection using Python

Till now what you learned is spam detection, what and why. I am pretty sure that it was dam clear to you. Now, this time is for the implementation. Here in this part, we train the models of machine learning to detect spam in your email with the help of Python language.. I’ll start this task by importing the required Python libraries and therefore the dataset you would like for this task is spam.csv

Step1:-Import Dependentias

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
nltk.download('stopwords')
import re
import sklearn
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

Step2:-Get Sms Dataset

sms = pd.read_csv('Spam SMS Collection', sep='t', names=['label','message'])
sms.head()
sms.drop_duplicates(inplace=True)
sms.reset_index(drop=True, inplace=True)
plt.figure(figsize=(8,5))
sns.countplot(x='label', data=sms)
plt.xlabel('SMS Classification')
plt.ylabel('Count')
plt.show()

Step3:-Cleaning the messages

corpus = []
ps = PorterStemmer()
for i in range(0,sms.shape[0]):
    message = re.sub(pattern='[^a-zA-Z]', repl=' ', string=sms.message[i])
#Cleaning special character from the message

    message = message.lower() #Converting the entire message into lower case
    words = message.split() # Tokenizing the review by words
    words = [word for word in words if word not in set(stopwords.words('english'))] 

#Removing the stop words

    words = [ps.stem(word) for word in words] #Stemming the words
    message = ' '.join(words) #Joining the stemmed words
    corpus.append(message) #Building a corpus of messages

Step4:-Creating the Bag of Words model

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=2500)
X = cv.fit_transform(corpus).toarray()

Step5:-Extracting dependent variable from the dataset

y = pd.get_dummies(sms['label'])
y = y.iloc[:, 1].values

Step6:-train_test_split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

Step7:-Checking alpha Naive Bayes Classifier

best_accuracy = 0.0
alpha_val = 0.0
for i in np.arange(0.0,1.1,0.1):
    temp_classifier = MultinomialNB(alpha=i)
    temp_classifier.fit(X_train, y_train)
    temp_y_pred = temp_classifier.predict(X_test)
    score = accuracy_score(y_test, temp_y_pred)
    print("Accuracy score for alpha={} is: {}%".format(round(i,1), round(score*100,2)))
    if score>best_accuracy:
        best_accuracy = score
        alpha_val = i
print('--------------------------------------------')
print('The best accuracy is {}% with alpha value as {}'.format(round(best_accuracy*100, 2), round(alpha_val,1)))

 

Step8:-Prediction

def predict_spam(sample_message):
    sample_message = re.sub(pattern='[^a-zA-Z]',repl=' ', string = sample_message)
    sample_message = sample_message.lower()
    sample_message_words = sample_message.split()
    sample_message_words = [word for word in sample_message_words if not word in set(stopwords.words('english'))]
    ps = PorterStemmer()
    final_message = [ps.stem(word) for word in sample_message_words]
    final_message = ' '.join(final_message)
    temp = cv.transform([final_message]).toarray()
    return classifier.predict(temp)
result = ['Wait a minute, this is a SPAM!','Ohhh, this is a normal message.']
msg = "Hi! You are pre-qualified for Premium SBI Credit Card. Also get Rs.500 worth Amazon Gift Card*, 10X Rewards Point* & more. Click "
if predict_spam(msg):
    print(result[0])
else:
    print(result[1])

OUTPUT

Wait a minute, this is a SPAM!

msg = "[Update] Congratulations Shivani, Your account is activated for investment in Stocks. Click to invest now: "
if predict_spam(msg):
    print(result[0])
else:
    print(result[1])

OUTPUT

Wait a minute, this is a SPAM!

msg = "Your Stockbroker FALANA BROKING LIMITED reported your fund balance Rs.1500.5 & securities balance 0.0 as of the end of MAY-20. Balances do not cover your bank, DP & PMS balance with the broking entity. Check details at [email protected] If the email Id is not correct, kindly update with your broker."
if predict_spam(msg):
    print(result[0])
else:
    print(result[1])

OUTPUT

Ohhh, this is a normal me

Summary

So this is often how you’ll train a machine learning or especially a deep learning model in order to make them able to detect whether an email or a message is spam or not. A Spam detector detects spam messages or emails by understanding text content in order that you’ll only receive notifications about messages or emails that are vital to you. I hope that this article will help you in increasing your reach towards spam detection. In today’s scenario, we can’t afford our security to lose so easily. Let’s start a campaign together with AnalyticsVidya to reduce cybercrime. Be happy to ask your valuable questions within the comments section below. For more applications of deep learning click here.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Our Top Authors

  • Analytics Vidhya
  • Guest Blog
  • Tavish Srivastava
  • Aishwarya Singh
  • Aniruddha Bhandari
  • Abhishek Sharma
  • Aarshay Jain

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *