This article was published as a part of the Data Science Blogathon.

We, as data science and machine learning enthusiasts, have learned about various algorithms like Logistic Regression, Linear Regression, Decision Trees, Naive Bayes, etc. **But at the same time, are we preparing for the interviews?** As we know, the end goal is to land our dream job for the companies we are aiming for. Henceforth, knowing how the questions are turned and twisted by the interviewer is very much important to answer in the most efficient reason; I’m starting with the series of **the Top 10 most frequently asked interview questions on various machine learning algorithms**.

In this article, we will be covering the **top 10 interview questions on the Naive Bayes classifier,** but we are not gonna jump straight over those tricky questions; instead, let’s first have some high-level understanding of this algorithm so that one will be able to understand the concept behind it.

Naive Bayes is considered to be the top choice while dealing with **classification problems,** and it has it’s rooted in the concept of probabilities. Specifically, this algorithm is the by-product of the **Bayes Theorem**. But you must be thinking that if it is based on Bayes theorem, why is this Naive term in the prefix position as **“Naive”** means **“Dumb”**? So is this algorithm dumb or useful?

The answer is simple and pretty straightforward; this algorithm is not at all Naive but, at times, quite useful and simple when compared to other complex algorithms. The reason it is known to be the naive Bayes is because of its general assumptions, which takes us to our very first interview question:

If one wants to give the short answer, then they can simply say – **“Features are independent.”** But this will not be sufficient; hence we need to explain the answer briefly: In Naive Bayes, it assumes beforehand that all the features are independent of each other, and it treats all of them **separately,** which gives each feature an **equal contribution** to the final result. This assumption is known as **the I.I.D assumption**.

- As it works independently with each feature, we can use it with large datasets for making
**generalized models**. - It has very much
**less sensitive**to other features, i.e.; it is not much affected by other components because of its**Naive**nature. - It tends to work efficiently with both
**continuous and discrete types**of datasets and is well-versed in handling**categorical features**in data. - When we have a dataset with very
**less training data,**then we can call up**the Naive Bayes**classifier in this scenario it outperforms other models.

- As we say that there are always two sides to a coin, the advantage of naive Bayes can also be a disadvantage at some stages. As it treats all the
**predictors independently,**for that reason, we are not able to use it in all**real-world cases**. - This algorithm faces a very major problem named
**the “Zero Frequency problem,”**in which it assigns**zero probabilities**to all the**categorical variables**whose categories were not present in the training dataset, which introduces a lot of**bias**in the model. - As the features are
**highly correlated,**it affects the model performance negatively.

A sure short answer should be: As the Naive Bayes classifier is **not dependent on the distance. Still, the** **probability** hence for that reason feature scaling is not required, i.e, Any algorithm which is not dependent on distance will not require **feature scaling**.

Naive Bayes is one of the algorithms that can handle the missing data at its end. Only the reason is that in this algo, all the attributes are handled separately during both **model construction** and **prediction time** If data points are missing for a certain feature, then it can be ignored when a probability is calculated for a **separate class**, which makes it handle the missing data at **model building phase** itself.Do refer to this amazing tutorial for a better understanding

Naive Bayes is **highly impacted by outliers** and completely robust in this case (depending on the USE case we are working on). The reason is the NB classifier assigns the **0 probability** for all the data instances it has not seen in the **training set,** which creates an issue during the **prediction time,** and the same goes with outliers also, as it would have been the same data that the classifier has not seen before.

Naive Bayes is a probabilistic-based machine learning algorithm, and it can be used widely in many classification tasks:

- Sentiment Analysis
- Spam classification
- Twitter sentiment analysis
- Document categorization

**The straightforward answer is:** Naive Bayes is a generative type of classifier. But this information is not enough. We should also know what a generative type of classifier is.**Generative:** This type of classifier learns from the model that generates the data behind the scene by estimating the **distribution of the model**. Then it predicts the unseen data. Henceforth, the same goes for the NB classifier, as it learns from the distribution of data and doesn’t create **a decision boundary** to classify components.

**Prior probability:** This can also be tagged as **an initial probability**. It’s the part of Bayesian statistics where it is the probability when the data is not even collected. That’s why it is known as **“Prior”** probability. This probability is the outcome vs. the current predictor **before the experiment is performed**.**Posterior probability:** In simple words, this is the probability that we get after **a few experiment trials. It** is the ascendant of prior probability. For that reason, it is also known as **updated probability.**

We have two separate and dedicated distributions for both categorical and numerical values to deal with either type of value. They are mentioned below:

**Categorical values:**In this case, we can get the probability for categorical variables by using**Multinomial or Bernoulli Distribution**.**Numerical values:**In this situation, we can estimate the probability by using**Normal or Gaussian**distribution.

So we are in the last section of this article and have reached here after completing the top 10 interview questions on **the NB classifier**. This segment usually briefly discusses everything so we can list our learnings in a nutshell.

- Firstly we started this small journey by introducing the concept behind the Naive Bayes algorithm, and straight after that, we discussed the
**assumptions**,**advantages**, and**disadvantages**of**the**Naive Bayes classifier. - Then we move on to tricky questions like, will this algorithm be affected by
**outliers**and**missing values**? Or will the**feature scaling**is the required step while analyzing this classifier? - At last, we covered some more questions based on the mathematical intuition behind this algorithm, like How naive Bayes treats
**categorical and numerical**values, what is**posterior and prior**probability, and last, whether the NB classifier is under**a generative or discriminative**category.

I hope you liked my article on the **Top 10 most frequently asked interview questions on the Naive Bayes classifier****.** If you have any opinions or questions, then comment below.

Connect with me on LinkedIn for further discussion.

**The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.**

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist
##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask