*This article was published as a part of the Data Science Blogathon.*

After we have trained our model and have predicted the outcomes, we need to evaluate the model’s performance. And here comes our Confusion Matrix. But before diving into what is a confusion matrix and how it evaluates the model’s performance, let’s have a look into the picture below.

It is the 2-dimensional array representation of the above table.

The above 2-dimensional array is the Confusion Matrix which evaluates the performance of a classification model on a set of predicted values for which the true(or actual) values are known.

Let’s have a definition of the four important terms that define the confusion matrix-

Cases, where the model claims that something has happened and actually it has happened i.e patient,* has cancer and the model also predicts cancer.*

Cases, where the model claims that nothing has happened and actually nothing, has happened i.e *patient doesn’t have cancer and the model also doesn’t predict cancer.*

Cases, where the model claims that something has happened when actually it hasn’t i.e patient,* doesn’t have cancer but the model predicts cancer.*

Cases, where the model claims nothing when actually something has happened i.e *patient has cancer but the model doesn’t predict cancer.*

* *

Now, all we need to know is how this matrix works on the dataset. For this, I’ll be using the same above case but in the form of a dataset.

Feature 1 |
Feature 2 |
. . . . . |
Target( has cancer: 0,doesn’t have cancer: 1) |
Prediction( what our model predicted) |

1 | 1 | |||

0 | 1 | |||

1 | 1 | |||

1 | 0 | |||

0 | 0 | |||

0 | 0 | |||

1 | 1 | |||

1 | 1 | |||

0 | 0 |

In the above classification dataset, Feature 1, Feature 2, and up to Feature n are the independent variables. For Target (dependent variable) we have assigned **1 **to a **positive value** (i.e have cancer) and **0** to a **negative value** (i.e doesn’t have cancer). After we have trained our model and got our predictions and we want to evaluate the performance, this is how the confusion matrix would look like.

- TP = 4, There are four cases in the dataset where the model predicted 1 and the target was also 1
- FP = 1, There is only one case in the dataset where the model predicted 1 but the target was 0
- FN = 1, There is only one case in the dataset where the model predicted 0 but the target was 1
- TN = 3, There are three cases in the dataset where the model predicted 0 and the target was also 0

The below-given metrics of confusion matrix determine how well our model performs-

- Accuracy
- Precision (Positive Prediction Value)
- Recall (True Positive Rate or Sensitivity)
- F beta Score

** **

** **Accuracy is the number of correctly (True) predicted results out of the total.

* Accuracy = (TP + TN) / (TP + TN + FP + FN)*

*= (4 + 3) / 9 = 0.77*

Accuracy should be considered when TP and TN are more important and the dataset is balanced because in that case the model will not get baised based on the class distribution. But in real-life classification problem, imbalanced class distribution exists.

For example, we have an imbalanced test data with 900 records of positive class (1) and 100 records of negative class (0). And our model predicted all records as positive (1). In that scenario, TP will be 900 and TN will be 0. Thus, accuracy = (900 + 0) / 1000 = 90%

Therefore, we should consider Precision, Recall and F Score as a better metric to evaluate the model.

** **

**2. PRECISION:**

** **Out of the total predicted positive values, how many were actually positive** **

* Precision = TP / (TP + FP)* *= 4/5 = 0.8*

Taking a use case of Spam Detection, suppose the mail is not spam (0), but the model has predicted it as spam (1) which is FP. In this scenario, one can miss the important mail. So, here we should focus on reducing the FP and must consider precision in this case.

** **

Out of the total actual positive values, how many were correctly predicted as positive

* Recall= TP / (TP + FN) = 4/5 = 0.8*

In Cancer Detection, suppose if a person is having cancer (1), but it is not predicted (0) by the model which is FN. This could be a disaster. So, in this scenario, we should focus on reducing the FN and must consider recall in this case.

Based on the problem statement, whenever the FP is having a greater impact, go for Precision and whenever the FN is important, go for Recall

In some use cases, both precision and recall are important. Also, in some use cases even though precision plays an important role or recall plays is important, we should combine both to get the most accurate result.

.

__Selecting beta value__

**F-1 Score ****(beta =1 ) **

When FP and FN both are equally important. This allows the model to consider both precision and recall equally using a single score.

F-1 Score is the *Harmonic Mean* of precision and recall.

If the impact of FP is high. This will give more weight to precision than to recall.

** **

If the impact of FN is high. Thus, giving more weight to recall and less to precision.

Hope you found this useful and interesting.

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist
##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask

Great article.clearly explained. There is a mistake in the recall paragraph. When recall should be considered? In Cancer Detection, suppose if a person is having cancer (1), but it is not ('not' should not be here)predicted (0) by the model which is FN. This could be a disaster. So, in this scenario, we should focus on reducing the FN and must consider recall in this case.

Great article. Very informative and great explanation.