This article was published as a part of the Data Science Blogathon.

When training a machine learning model, the model can be easily overfitted or under fitted. To avoid this, we use regularization in machine learning to properly fit the model to our test set. Regularization techniques help reduce the possibility of overfitting and help us obtain an optimal model. In this article titled ‘The Ultimate Guide to Regularization in Machine Learning, you will learn everything you need to know about regularization.

To train our machine learning model, we provide it with data to learn from. The process of plotting a series of data points and drawing a line of best fit to understand the relationship between variables is called Data Fitting. Our model is best suited when it can find all the necessary patterns in our data and avoid random data points, and unnecessary patterns called noise.

If we allow our machine learning model to look at the data too many times, it will find many patterns in our data, including some that are unnecessary. It will learn well on the test dataset and fits very well. It will learn important patterns, but it will also learn from the noise in our data and will not be able to make predictions on other data sets.

A scenario where a machine learning model tries to learn from the details along with the noise in the data and tries to fit each data point to a curve is called Overfitting.

In the figure below, we can see that the model is fit for every point in our data. If new data is provided, the model curves may not match the patterns in the new data, and the model may not predict very well.

Conversely, in the scenario where the model has not been allowed to look at our data enough times, the model will not be able to find patterns in our test data set. It won’t fit our test data set properly and won’t work on new data either.

A scenario where a machine learning model can neither learn the relationship between variables in the test data nor predict or classify a new data point is called Underfitting.

The image below shows an underequipped model. We can see that it doesn’t fit the data given correctly. He did not find patterns in the data and ignored much of the data set. It cannot work with both known and unknown data.

Bias comes out when the algorithm has limited flexibility to learn from the dataset. These models pay little attention to the training data and oversimplify the model, so the validation or prediction errors and training errors follow similar trends. Such models always lead to high errors in the training and test data. High bias causes under-adjustment in our model.

The variance defines the sensitivity of the algorithm to specific data sets. A high-variance model pays close attention to the training data and does not generalize, so the validation or prediction errors are far from each other. Such models usually perform very well on the training data but have a high error rate on the test data. High deviation causes an overshoot in our model.

An optimal model is one in which the model is sensitive to the pattern in our model but can also generalize to new data. This occurs when both bias and variance are optimal. We call this the Bias-Variance Tradeoff, and we can achieve this in models over or under-fitted models using regression.

The above figure shows that when the bias is high, the error in both the test and training sets is also high. When the deviation is high, the model performs well on our training set and gives a low error, but the error on our test set is very high. In the middle of this, there is a region where bias and variance are in perfect balance with each other here too, but training and testing errors are low.

Regularization refers to techniques used to calibrate machine learning models to minimize the adjusted loss function and avoid overfitting or underfitting.

**Ridge Regularization**

Also known as Ridge Regression, it adjusts models with overfitting or underfitting by adding a penalty equivalent to the sum of the squares of the magnitudes of the coefficients.

This means that the mathematical function representing our machine learning model is minimized and the coefficients are calculated. The size of the coefficients is multiplied and added. Ridge Regression performs regularization by reducing the coefficients present. The function shown below shows the cost function of the ridge regression.

**Lasso Regularization**

Modifies overfitted or under-fitted models by adding a penalty equivalent to the sum of the absolute values â€‹â€‹of the coefficients.

Lasso regression also performs coefficient minimization, but instead of squaring the magnitudes of the coefficients, it takes the actual values â€‹â€‹of the coefficients. This means that the sum of the coefficients can also be 0 because there are negative coefficients. Consider the cost function for the lasso regression.

**Elastic Net**

Elastic Net combines L1 and L2 With the addition of an alpha Parameter.

Let’s see how regularization can be implemented in Python. We have taken the Advertising Dataset on which we will use linear regression to predict Advertisement cost.

We start by importing all the necessary modules.

import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns

We then load the Advertisement Dataset from sklearnâ€™s datasets.

df = pd.read_csv("Advertising.csv")

Splitting the Dataset into Training and Testing Dataset:

Applying the Train Train Split:

**Python Code:**

Now we can use them for training our linear regression model. We’ll start by creating our model and fitting the data into it. We then predict on the test set and find the error in our prediction using mean_squared_error. Finally, we print the coefficients of our linear regression model.

**Ridge Regression:**

Â Modelling with default Parameters:

from sklearn.linear_model import Ridge ridge_model = Ridge() ridge_model.fit(X_train, y_train)

Predictions and Evaluation Of Ridge Regression:

test_predictions = ridge_model.predict(X_test) train_predictions = ridge_model.predict(X_train)

**Hyperparameter Tuning of Ridge : **Identifying the best alpha value for Ridge Regression:

from sklearn.model_selection import GridSearchCV estimator = Ridge() estimator = Ridge() param_grid = {"alpha":list(range(1,11))} model_hp = GridSearchCV(estimator, param_grid, cv = 5) model_hp.fit(X_train, y_train) model_hp.best_params_

**Lasso Regularization:**

Modeling of Lasso Regularization:

from sklearn.linear_model import Lasso lasso_model = Lasso() lasso_model.fit(X_train, y_train)

Predictions and Evaluation Of Lasso Regression:

test_predictions = lasso_model.predict(X_test) train_predictions = lasso_model.predict(X_train) from sklearn.metrics import mean_squared_error train_rmse = np.sqrt(mean_squared_error(y_test, test_predictions)) test_rmse = np.sqrt(mean_squared_error(y_train, train_predictions)) print("train RMSE:", train_rmse) print("test RMSE:", test_rmse)

**Hyperparameter Tuning of Lasso:Â **

Identifying the best alpha value for Lasso Regression:

param_grid = {"alpha": list(range(1,11))} model_hp = GridSearchCV(estimator, param_grid, cv =5) model_hp.fit(X_train, y_train) model_hp.best_estimator_

**Elastic Net Regularization:**

modeling of Elastic Net Regularization:

from sklearn.linear_model import ElasticNet enr_model = ElasticNet(alpha=2, l1_ratio = 1) enr_model.fit(X_train, y_train)

Predictions and Evaluation Of Elastic Net:

test_predictions = enr_model.predict(X_test) train_predictions = enr_model.predict(X_train) from sklearn.metrics import mean_squared_error train_rmse = np.sqrt(mean_squared_error(y_test, test_predictions)) test_rmse = np.sqrt(mean_squared_error(y_train, train_predictions)) print("train RMSE:", train_rmse) print("test RMSE:", test_rmse)

**Hyperparameter Tuning of Elastic Net: **Identifying the best alpha value for Elastic Net:

from sklearn.model_selection import GridSearchCV enr_hp = GridSearchCV(estimator, param_grid) enr_hp.fit(X_train, y_train) enr_hp.best_params_ param_grid = { "alpha" : [0, 0.1, 0.2, 1, 2, 3, 5, 10], "l1_ratio" : [0.1, 0.5, 0.75, 0.9, 0.95, 1]} estimator = ElasticNet()

In this article – The Ultimate Guide to Regularization in Machine Learning, we learned about the different ways models can become unstable by being under- or overfitted. We observed the role of bias and Variance. We then moved to regularizationÂ techniques to overcome overfitting and underfitting. and Finally, we Saw a Python Code Implementation.

Also In this article, we studied Overfittings and Underfitting of a Linear model and how regularization Techniques can be used to overcome these issues.

- We learned about the L1 and L2 Regularization, which are added to the cost function.
- In each of these algorithms, we can set to specify several hyperparameters.
- We can use GridSearchCV or the respective Hyperparameter Tuning algorithms of the given respective regression model to find the optimal hyperparameters.

You can try to compare the performance of these algorithms on a dataset and check which algorithm performed better using a performance metric like Root Mean Square Error or RMSE.

Thank you for reading.

I hope you enjoyed the questions and were able to test your knowledge about Data Science and Machine Learning.

Please feel free to contact me on Linkedin.

email: [email protected]

**The media shown in this article is not owned by Analytics Vidhya and is used at the Authorâ€™s discretion.**

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist
##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask