*This article was published as a part of the Data Science Blogathon.*

Machine Learning deals with two types of problems: Regression problems and Classification problems. **Regression techniques** or models are used when our dependent variable is **continuous in nature** whereas **Classification techniques** are used when the dependent variable is **categorical. **

When a Machine Learning model is built various evaluation metrics are used to check the quality or the performance of a model. For classification models, metrics such as Accuracy, Confusion Matrix, Classification report (i.e Precision, Recall, F1 score), and AUC-ROC curve are used.

In this article, we will deep dive into the most common and famous evaluation metric which is **Confusion Matrix** and will understand all the elements of it in detail. So keep on reading 🙂

Please jump on to the 4th part of the article if you already know Confusion Matrix.

Confusion Matrix is the visual representation of the Actual VS Predicted values. It measures the performance of our Machine Learning classification model and looks like a table-like structure.

This is how a Confusion Matrix of a binary classification problem looks like :

It represents the different combinations of Actual VS Predicted values. Let’s define them one by one.

**TP: True Positive:** The values which were actually positive and were predicted positive.

**FP: False Positive:** The values which were actually negative but falsely predicted as positive. Also known as Type I Error.

**FN: False Negative**: The values which were actually positive but falsely predicted as negative. Also known as Type II Error.

**TN: True Negative:** The values which were actually negative and were predicted negative.

Taking an example of the **Stock Market Crash prediction** project. This is a binary classification problem where **1 means the stock market will crash** and **0 means the stock market will not crash and **suppose we have **1000 records** in our dataset.

Let’s see the confusion matrix of the following :

In the above matrix, we can analyze the model as :

True positive: 540 records of the stock market crash were **predicted correctly** by the model.

False-positive: 150 records of not a stock market crash were **wrongly predicted** as a market crash.

False-negative: 110 records of a market crash were **wrongly predicted **as not a market crash.

True Negative: 200 records of not a market crash were predicted correctly by the model.

It is calculated by dividing the total number of correct predictions by all the predictions.

The recall is the measure to check correctly positive predicted outcomes out of the total number of positive outcomes.

Precision checks how many outcomes are actually positive outcomes out of the total positively predicted outcomes.

F beta score is the harmonic mean of Precision and Recall and it captures the contribution of both of them. The contribution depends on the beta value in the below formula.

The default beta value is 1 which gives us the formula of F1score, where the contribution of Precision and Recall are the same. Higher the F1 score, the better the model.

The beta value < 1 gives more weight to Precision than Recall and the beta value>1 gives more weight to Recall.

You can calculate the values of all the above-mentioned metrics using the Stock market crash example provided above.

Here comes the most important part of all of the above discussion i.e **when to use which metric**.

By this statement I mean to say which measure we should go for to evaluate our model, with Accuracy or with Recall or Precision or both.

Confusing ???? :p

It’s not, I’ll explain this by taking some examples which will clear your concepts even more. So let’s start.

**Accuracy** is the standard metric to go for to evaluate a classification machine learning model

But

**We can not rely on Accuracy all the time** as in some cases accuracy gives us a wrong interpretation of the quality of the model, for example in the case when our dataset is imbalanced.

**Another case of not using Accuracy** is when we are dealing with a domain-specific project or when our company wants a particular result from the model. Let’s get into more detail with some examples.

__Example 1: Domain-Specific case__

Taking our previous example of **Stock Market Crash Prediction**, our main aim should be to reduce the outcomes where the model was predicting as not a market crash whereas it was a market crash.

Imagine a situation where our model has wrongly predicted that the market will not crash and instead it crashed, the people have to go through a lot of losses in this case.

**The measure which takes into account this problem is FN and therefore Recall. So we need to focus on reducing the value of FN and increasing the value of Recall. **

**In most medical cases, such as cancer prediction or any disease prediction we try to reduce the value of FN.**

__Example 2: Spam Detection __

In the case of Email Spam detection, if an email is predicted as a scam but is not actually a scam then it can cause problems to the user.

**In this case, we need to focus on reducing the value of FP (i.e when the mail is falsely predicted as spam) and as a result, increasing the value of Precision.**

**In some cases of imbalanced data problems, both Precision and Recall are important so we consider the F1 score as an evaluation metric.**

There is another concept of the AUC ROC curve for evaluation of a classification model, which is one of the most important metrics to learn. We will discuss that in some other blog of mine.

After reading this article, I hope your indecisiveness regarding choosing the best evaluation metrics for your model is resolved. Please let me know about your inputs and suggestion and connect with me on LinkedIn

To score a confusion matrix, various metrics can be used to evaluate the performance of a classification model. These metrics include accuracy, precision, recall, F1-score, ROC curve with AUC, error rates, and additional metrics specific to the application. The choice of metrics depends on the context and objectives of the classification task.

A perfect confusion matrix is a hypothetical scenario in which a classification model correctly classifies all data points. This would result in a matrix with all true positives (TP) and true negatives (TN) along the diagonal, and zero false positives (FP) and false negatives (FN) in the off-diagonal entries.

I am Deepanshi Dhingra currently working as a Data Science Researcher, and possess knowledge of Analytics, Exploratory Data Analysis, Machine Learning, and Deep Learning.

*The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. *

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist
##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask