*This article was published as part of the Data science Blogathon.*

In this post, we will explain the bias-variance tradeoff in machine learning and how we can get the most out of it. We will follow up with some illustrative examples and practical implementation in the end.

**First things first let us learn about what does bias, variance, and trade-off actually mean practically without learning some bookish definitions.**

So Bias is basically a part of a generalization error that comes across due to some wrong assumptions that we make during our development process i.e. assuming that the data is linear when it is actually quadratic.

A high-bias model is most likely to **underfit **our training data.** **

So Variance comes into play whenever our model is extremely sensitive to minor variations in the training data. If the model is having many degrees of freedom like a high degree polynomial model is most likely to have high variance.

A high-variance model is most likely to **overfit **our training data.

So after keeping the above things in mind, we can come to a conclusion that increasing the complexity of our model will typically increase its variance and reduce its bias. Conversely, reducing the complexity of our model increases its bias and reduces its variance.

Hence what we will be aiming for are low bias and low variance, but really what we will be doing is trade-off bias for variance

Therefore it is called a **trade-off.**

**Let’s go ahead and discuss this topic by imagining a dartboard.**

Imagine that the center of the target is a model that perfectly predicts the correct values. As we move away from the bulls-eye, our predictions get poor and poor. Here we can repeat our entire model building process to get a number of separate hits on the target. Each hit represents an individual realization of our model, assuming there is variability in the training data we gather.

Sometimes we might be lucky enough to get a good distribution of training data so we predict well and we are close to bulls-eye, while sometimes that might not be the case when our training data is filled with outliers or some non-standard values resulting in poor predictions.

These different realizations result in scattering hits on the target.

Looking at the different realizations in the above image we see:

- Low bias and low variance – This is predicting correct values on the bulls-eye.
- Low bias and high variance – This will predict values around the bulls-eye with a high degree of variance.
- High bias and low variance – This will have high bias around a certain location but low variance so all your model predictions are in a certain area.
- High variance and high bias – This is the worst means predicted values tend to be all over the place.

**We all know that there is a common temptation for beginners i.e. to continuously add complexity to a model until it fits the training set very well.**

Here may be as a beginner you decide hey maybe I should make the model more and more complex or flexible so that it hits all those training points. However, if you are going to hit all those training points your model is going to fail to predict new test points which means it will overfit the training data.

This is a classic plot to show this as a general stance where you have low versus high model complexity on the X-axis and some sort of prediction error on the Y-axis.

If you move to your left you get a higher bias but lower variance and as you move to your right to a higher complexity model you get a lower bias and higher variance.

Now what we need to do is to pick an optimal point(sweet spot) where you are comfortable with the bias trade-off. If you go to the left of it you will underfit the data and if you go to the right of it you will start to overfit the data.

**As all of our explanation portions have come to an end, let’s get started with our practical implementation.**

Now comes the question of whether we can calculate the bias-variance trade-off for a particular algorithm?

And the answer is No. Technically we cannot calculate the actual values of bias-variance for a model.

Generally, we use bias, variance, irreducible error(noisiness of data), and the bias-variance trade-off to help us select and configure our model.

But we can estimate it in some cases.

The mlxtend library by Sebastian Raschka has a function bias_variance_decomp() function that can estimate the bias and variance for a model.

Firstly, you will need to install mlxtend library.

pip install mlxtend

After that, we will be using Boston housing dataset as our dataset.

from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from mlxtend.evaluate import bias_variance_decomp url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv' dataframe = read_csv(url, header=None) data = dataframe.values X, y = data[:, :-1], data[:, -1] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101) model = LinearRegression() mse, bias, var = bias_variance_decomp(model, X_train, y_train, X_test, y_test, loss='mse', num_rounds=200, random_seed=1) print('MSE: %.3f' % mse) print('Bias: %.3f' % bias) print('Variance: %.3f' % var)

At last, the output we will be getting would be:

MSE: 34.904 Bias: 33.438 Variance: 1.466

Here we can see that we have a high bias and low variance.

I hope this article made you help gain a better insight into this concept. Leave a comment below if you have any follow-up questions and I will try to answer them.

Thank you,

Karan Amal Pradhan.

*The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.*

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist
##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask