This article was published as a part of theÂ Data Science Blogathon

Model overfitting is a serious problem and can cause the model to produce misleading information. One of the techniques to overcome overfitting is Regularization. Regularization, in general, penalizes the coefficients that cause the overfitting of the model. There are two norms in regularization that can be used as per the scenarios.

In this article, we will learn about Regularization, the two norms of Regularization, and the Regression techniques based on these Regularization techniques.

Image by Victor Freitas from Pexels

- Overfitting and Regularization
- L1 Regularization or LASSO
- L2 Regularization or Ridge
- LASSO Regression
- Ridge Regression
- ElasticNet Regression
- Conclusions

Overfitting of the model occurs when the model learns just ‘too-well’ on the train data. This would sound like an advantage but it is not. When a model is overtrained on training data, it performs worst on the test data or any new data provided. Technically, the model learns the details as well as the noise of the train data. This would hinder the performance of any new data provided to the model as the learned details and noise cannot be applied to the new data. This is the case when we say the performance of the model is not adequate. There are several ways of avoiding the overfitting of the model such as K-fold cross-validation, resampling, reducing the number of features, etc. One of the ways is to apply Regularization to the model. Regularization is a better technique than Reducing the number of features to overcome the overfitting problem as in Regularization we do not discard the features of the model.

Regularization is a technique that penalizes the coefficient. In an overfit model, the coefficients are generally inflated. Thus, Regularization adds penalties to the parameters and avoids them weigh heavily. The coefficients are added to the cost function of the linear equation. Thus, if the coefficient inflates, the cost function will increase. And Linear regression model will try to optimize the coefficient in order to minimize the cost function.

Image by Nicoguaro

Practically, you can check if the regression model is overfitting or not by RMSE. A good model has a similar RMSE for the train and test sets. If the difference is too large, we can say the model is overfitting to the training set. There are two kinds of techniques for adding penalities to the cost function, L1 Norm or LASSO term and L2 Norm or Ridge Term.

L1 Regularization technique is also known as **LASSO **or Least Absolute Shrinkage and Selection Operator. In this, the penalty term added to the cost function is the summation of absolute values of the coefficients. Since the absolute value of the coefficients is used, it can reduce the coefficient to 0 and such features may completely get discarded in LASSO. Thus, we can say, LASSO helps in Regularization as well as Feature Selection.

Following is the equation of Cost function with** L1 penalty** term:

Cost Function after adding L1 Penalty (Source – Personal Computer)

Here, **alpha **is the multiplier term.

L2 Regularization technique is also known as **Ridge**. In this, the penalty term added to the cost function is the summation of the squared value of coefficients. Unlike the LASSO term, the Ridge term uses squared values of the coefficient and can reduce the coefficient value near to 0 but not exactly 0. Ridge distributes the coefficient value across all the features.

Following is the equation of Cost function with** L2 penalty **term:

Cost Function after adding L2 Penalty (Source – Personal Computer)

Here, **alpha **is the multiplier term.

LASSO Regression is a linear model built by applying the** L1 or LASSO penalty term**. Let’s see how to build a LASSO regression model in Python.

**Importing the Libraries**

import numpy as np import pandas as pd from sklearn import metrics from sklearn.linear_model import Lasso

**Importing the Dataset**

df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv')

The dataset has been taken from Kaggle.

**Drop Duplicates if Any**

df_train = df_train.dropna() df_test = df_test.dropna()

**Specifying x_train, x_test, y_train, y_test variables for Regression**

**Building LASSO Regression Model**

lasso = Lasso()

**Â **

**Fitting the Model on Train Set**

lasso.fit(x_train, y_train)

**Â **

**Calculating Train RMSE for Lasso Regression
**

print("Lasso Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, lasso.predict(x_train))), 5))

**Â **

**Calculating Test RMSE for Lasso Regression
**

print("Lasso Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, lasso.predict(x_test))), 5))

**Â **

**Putting it all together
**

import numpy as np import pandas as pd from sklearn import metrics from sklearn.linear_model import Lasso df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv') df_train = df_train.dropna() df_test = df_test.dropna() x_train = df_train['x'] x_train = x_train.values.reshape(-1,1) y_train = df_train['y'] y_train = y_train.values.reshape(-1,1) x_test = df_test['x'] x_test = x_test.values.reshape(-1,1) y_test = df_test['y'] y_test = y_test.values.reshape(-1,1) lasso = Lasso() lasso.fit(x_train, y_train) print("Lasso Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, lasso.predict(x_train))), 5)) print("Lasso Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, lasso.predict(x_test))), 5))

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the LASSO model to find the appropriate alpha value using **LassoCV **or **GridSearchCV**.

Ridge Regression is a linear model built by applying the **L2 or Ridge penalty term**. Let’s see how to build a Ridge regression model in Python.

**Â **

**Importing the Libraries**

import numpy as np import pandas as pd from sklearn import metrics from sklearn.linear_model import Ridge

**Â **

**Importing the Dataset**

df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv')

The dataset has been taken from Kaggle.

**Â **

**Drop Duplicates if Any**

df_train = df_train.dropna() df_test = df_test.dropna()

**Â **

**Specifying x_train, x_test, y_train, y_test variables for Regression**

x_train = df_train['x'] x_train = x_train.values.reshape(-1,1) y_train = df_train['y'] y_train = y_train.values.reshape(-1,1) x_test = df_test['x'] x_test = x_test.values.reshape(-1,1) y_test = df_test['y'] y_test = y_test.values.reshape(-1,1)

**Â **

**Building Ridge Regression Model**

ridge = Ridge()

**Fitting the Model on Train Set**

ridge.fit(x_train, y_train)

**Â **

**Calculating Train RMSE for Ridge Regression
**

print("Ridge Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, ridge.predict(x_train))), 5))

**Â **

**Calculating Test RMSE for Ridge Regression
**

print("Ridge Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, ridge.predict(x_test))), 5))

**Â **

**Putting it all together
**

import numpy as np import pandas as pd from sklearn import metrics from sklearn.linear_model import Ridge df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv') df_train = df_train.dropna() df_test = df_test.dropna() x_train = df_train['x'] x_train = x_train.values.reshape(-1,1) y_train = df_train['y'] y_train = y_train.values.reshape(-1,1) x_test = df_test['x'] x_test = x_test.values.reshape(-1,1) y_test = df_test['y'] y_test = y_test.values.reshape(-1,1) ridge = Ridge() ridge.fit(x_train, y_train) print("Ridge Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, ridge.predict(x_train))), 5)) print("Ridge Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, ridge.predict(x_test))), 5))

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the Ridge model to find the appropriate alpha value using **RidgeCV **or **GridSearchCV**.

ElasticNet Regression is a linear model built by applying **both L1 and L2 penalty terms**. Let’s see how to build an ElasticNet regression model in Python.

**Importing the Libraries**

import numpy as np import pandas as pd from sklearn import metrics from sklearn.linear_model import ElasticNet

**Â **

**Importing the Dataset**

df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv')

The dataset has been taken from Kaggle.

**Drop Duplicates if Any**

df_train = df_train.dropna() df_test = df_test.dropna()

**Â **

**Specifying x_train, x_test, y_train, y_test variables for Regression**

x_train = df_train['x'] x_train = x_train.values.reshape(-1,1) y_train = df_train['y'] y_train = y_train.values.reshape(-1,1) x_test = df_test['x'] x_test = x_test.values.reshape(-1,1) y_test = df_test['y'] y_test = y_test.values.reshape(-1,1)

**Â **

**Building ElasticNet Regression Model**

enet = ElasticNet()

**Fitting the Model on Train Set**

enet.fit(x_train, y_train)

**Â **

**Calculating Train RMSE for ElasticNet Regression
**

print("ElasticNet Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, enet.predict(x_train))), 5))

**Â **

**Calculating Test RMSE for ElasticNet Regression
**

print("ElasticNet Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, enet.predict(x_test))), 5))

**Â **

**Putting it all together
**

import numpy as np import pandas as pd from sklearn import metrics from sklearn.linear_model import ElasticNet df_train = pd.read_csv('train.csv') df_test = pd.read_csv('test.csv') df_train = df_train.dropna() df_test = df_test.dropna() x_train = df_train['x'] x_train = x_train.values.reshape(-1,1) y_train = df_train['y'] y_train = y_train.values.reshape(-1,1) x_test = df_test['x'] x_test = x_test.values.reshape(-1,1) y_test = df_test['y'] y_test = y_test.values.reshape(-1,1) enet = ElasticNet() enet.fit(x_train, y_train) print("ElasticNet Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, enet.predict(x_train))), 5)) print("ElasticNet Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, enet.predict(x_test))), 5))

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the Ridge model to find the appropriate alpha value using **ElasticNetCV **or **GridSearchCV**.

In this article, we learned about Overfitting in linear models and Regularization to avoid this problem. We learned about L1 and L2 penalty terms that get added into the cost function. We looked at three regression algorithms based on L1 and L2 Regularization techniques. We can set specify several hyperparameters in each of these algorithms. To find the optimal hyperparameters, we can use GridSearchCV or relevant Hyperparameter Tuning algorithms of that respective regression model. One can try comparing the performance of these algorithms on a dataset to check which algorithm performed better using a performance metric such as Root Mean Square Error or RMSE.

Connect with me on **LinkedIn** Here.

Check out my other Articles Here and on Medium

You can provide your valuable feedback to me on LinkedIn.

Thanks for giving your time!

*The media shown in this article are not owned by Analytics Vidhya and are used at the Authorâ€™s discretion.*

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Become a full stack data scientist##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

##

Understanding Cost Function
Understanding Gradient Descent
Math Behind Gradient Descent
Assumptions of Linear Regression
Implement Linear Regression from Scratch
Train Linear Regression in Python
Implementing Linear Regression in R
Diagnosing Residual Plots in Linear Regression Models
Generalized Linear Models
Introduction to Logistic Regression
Odds Ratio
Implementing Logistic Regression from Scratch
Introduction to Scikit-learn in Python
Train Logistic Regression in python
Multiclass using Logistic Regression
How to use Multinomial and Ordinal Logistic Regression in R ?
Challenges with Linear Regression
Introduction to Regularisation
Implementing Regularisation
Ridge Regression
Lasso Regression

Introduction to Stacking
Implementing Stacking
Variants of Stacking
Implementing Variants of Stacking
Introduction to Blending
Bootstrap Sampling
Introduction to Random Sampling
Hyper-parameters of Random Forest
Implementing Random Forest
Out-of-Bag (OOB) Score in the Random Forest
IPL Team Win Prediction Project Using Machine Learning
Introduction to Boosting
Gradient Boosting Algorithm
Math behind GBM
Implementing GBM in python
Regularized Greedy Forests
Extreme Gradient Boosting
Implementing XGBM in python
Tuning Hyperparameters of XGBoost in Python
Implement XGBM in R/H2O
Adaptive Boosting
Implementing Adaptive Boosing
LightGBM
Implementing LightGBM in Python
Catboost
Implementing Catboost in Python

Introduction to Clustering
Applications of Clustering
Evaluation Metrics for Clustering
Understanding K-Means
Implementation of K-Means in Python
Implementation of K-Means in R
Choosing Right Value for K
Profiling Market Segments using K-Means Clustering
Hierarchical Clustering
Implementation of Hierarchial Clustering
DBSCAN
Defining Similarity between clusters
Build Better and Accurate Clusters with Gaussian Mixture Models

Introduction to Machine Learning Interpretability
Framework and Interpretable Models
model Agnostic Methods for Interpretability
Implementing Interpretable Model
Understanding SHAP
Out-of-Core ML
Introduction to Interpretable Machine Learning Models
Model Agnostic Methods for Interpretability
Game Theory & Shapley Values

Deploying Machine Learning Model using Streamlit
Deploying ML Models in Docker
Deploy Using Streamlit
Deploy on Heroku
Deploy Using Netlify
Introduction to Amazon Sagemaker
Setting up Amazon SageMaker
Using SageMaker Endpoint to Generate Inference
Deploy on Microsoft Azure Cloud
Introduction to Flask for Model
Deploying ML model using Flask

Very well explained.thank you