Naive Bayes is a machine learning algorithm that is used by data scientists for classification. The naive Bayes algorithm works based on the Bayes theorem. Before explaining Naive Bayes, first, we should discuss Bayes Theorem. Bayes theorem is used to find the probability of a hypothesis with given evidence. This beginner-level article intends to introduce you to the Naive Bayes algorithm and explain its underlying concept and implementation.
In this equation, using Bayes theorem, we can find the probability of A, given that B occurred. A is the hypothesis, and B is the evidence.
P(B|A) is the probability of B given that A is True.
P(A) and P(B) are the independent probabilities of A and B.
Learning Objectives
This article was published as a part of the Data Science Blogathon.
The Naive Bayes classifier algorithm is a machine learning technique used for classification tasks. It is based on Bayes’ theorem and assumes that features are conditionally independent of each other given the class label. The algorithm calculates the probability of a data point belonging to each class and assigns it to the class with the highest probability.
Naive Bayes is known for its simplicity, efficiency, and effectiveness in handling high-dimensional data. It is commonly used in various applications, including text classification, spam detection, and sentiment analysis.
Let’s understand the concept of the Naive Bayes Theorem and how it works through an example. We are taking a case study in which we have the dataset of employees in a company, our aim is to create a model to find whether a person is going to the office by driving or walking using the salary and age of the person.
In the above image, we can see 30 data points in which red points belong to those who are walking and green belong to those who are driving. Now let’s add a new data point to it. Our aim is to find the category that the new point belongs to
Note that we are taking age on the X-axis and Salary on the Y-axis. We are using the Naive Bayes algorithm to find the category of the new data point. For this, we have to find the posterior probability of walking and driving for this data point. After comparing, the point belongs to the category having a higher probability.
In the above image, we can see 30 data points in which red points belong to those who are walking and green belong to those who are driving. Now let’s add a new data point to it. Our aim is to find the category that the new point belongs to
The posterior probability of walking for the new data point is:
and that for the driving is:
Step 1: We have to find all the probabilities required for the Bayes theorem for the calculation of posterior probability.
P(Walks) is simply the probability of those who walk among all.
In order to find the marginal likelihood, P(X), we have to consider a circle around the new data point of any radii, including some red and green points.
P(X|Walks) can be found by:
Now we can find the posterior probability using the Bayes theorem,
Step 2: Similarly, we can find the posterior probability of Driving, and it is 0.25
Step 3: Compare both posterior probabilities. When comparing the posterior probability, we can find that P(walks|X) has greater values, and the new point belongs to the walking category.
Source: Unsplash
Now let’s implement Naive Bayes step by step using the python programming language
We are using the Social network ad dataset. The dataset contains the details of users on a social networking site to find whether a user buys a product by clicking the ad on the site based on their salary, age, and gender.
Step 1: Importing the libraries
Let’s start the programming by importing the essential libraries required.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Step 2: Importing the dataset
Python Code:
Since our dataset contains character variables, we have to encode it using LabelEncoder.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
Step 3: Train test splitting
We are splitting our data into train and test datasets using the scikit-learn library. We are providing the test size as 0.20, which means our training data contains 320 training sets, and the test sample contains 80 test sets.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
Step 4: Feature scaling
Next, we are doing feature scaling to the training and test set of independent variables.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)
Let’s predict the test results
y_pred = classifier.predict(X_test)
Predicted and actual value
y_pred
y_test
For the first 8 values, both are the same. We can evaluate our matrix using the confusion matrix and accuracy score by comparing the predicted and actual test values
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
confusion matrix
ac – 0.9125
Accuracy is good. Note that you can achieve better results for this problem using different algorithms.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Training the Naive Bayes model on the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
ac = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)
There are several variants of Naive Bayes, such as Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes. Each variant has its own assumptions and is suited for different types of data. Here are some assumptions that the Naive Bayers algorithm makes:
The naive Bayes algorithm is a powerful and widely-used machine learning algorithm that is particularly useful for classification tasks. This article explains the basic math behind the Naive Bayes algorithm and how it works for binary classification problems. Its simplicity and efficiency make it a popular choice for many data science applications. we have covered most concepts of the algorithm and how to implement it in Python. Hope you liked the article, and do not forget to practice algorithms.
Key Takeaways
A. The naive Bayes classifier is a good choice when you want to solve a binary or multi-class classification problem when the dataset is relatively small and the features are conditionally independent. It is a fast and efficient algorithm that can often perform well, even when the assumptions of conditional independence do not strictly hold. Due to its high speed, it is well-suited for real-time applications. However, it may not be the best choice when the features are highly correlated or when the data is highly imbalanced.
A. Bayes theorem provides a way to calculate the conditional probability of an event based on prior knowledge of related conditions. The naive Bayes algorithm, on the other hand, is a machine learning algorithm that is based on Bayes’ theorem, which is used for classification problems.
It is not a regression technique, although one of the three types of Naive Bayes, called Gaussian Naive Bayes, can be used for regression problems.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Awesome explanation ☺ This will clear all the doubts and its very helful for newbies. Keep up the good work 👍
Hi, Great post;) I would like to ask when estimating the marginal likelihood P(X), we need to draw a circle around the new data, how should we choose the radius in order to increase accuracy of the estimation? And how is the radius or metric used going too affect the accuracy? Is there any book you can recommend for this topic? Thank you so much.
Thanks Surbhi! Easy to understand.
Hi thanks for the explanation, and have a question why you are applying feature scaling? Thanks