Prevent Overfitting Using Regularization Techniques

Rahul Shah 04 Aug, 2022

7 min read

This article was published as a part of the Data Science Blogathon

Introduction

Model overfitting is a serious problem and can cause the model to produce misleading information. One of the techniques to overcome overfitting is Regularization. Regularization, in general, penalizes the coefficients that cause the overfitting of the model. There are two norms in regularization that can be used as per the scenarios.

In this article, we will learn about Regularization, the two norms of Regularization, and the Regression techniques based on these Regularization techniques.

Prevent Overfitting Using Regularization Techniques

Image by Victor Freitas from Pexels

Overfitting and Regularization

Overfitting of the model occurs when the model learns just ‘too-well’ on the train data. This would sound like an advantage but it is not. When a model is overtrained on training data, it performs worst on the test data or any new data provided. Technically, the model learns the details as well as the noise of the train data. This would hinder the performance of any new data provided to the model as the learned details and noise cannot be applied to the new data. This is the case when we say the performance of the model is not adequate. There are several ways of avoiding the overfitting of the model such as K-fold cross-validation, resampling, reducing the number of features, etc. One of the ways is to apply Regularization to the model. Regularization is a better technique than Reducing the number of features to overcome the overfitting problem as in Regularization we do not discard the features of the model.

Regularization is a technique that penalizes the coefficient. In an overfit model, the coefficients are generally inflated. Thus, Regularization adds penalties to the parameters and avoids them weigh heavily. The coefficients are added to the cost function of the linear equation. Thus, if the coefficient inflates, the cost function will increase. And Linear regression model will try to optimize the coefficient in order to minimize the cost function.

Prevent Overfitting Using Regularization Techniques 1

Image by Nicoguaro

Practically, you can check if the regression model is overfitting or not by RMSE. A good model has a similar RMSE for the train and test sets. If the difference is too large, we can say the model is overfitting to the training set. There are two kinds of techniques for adding penalities to the cost function, L1 Norm or LASSO term and L2 Norm or Ridge Term.

L1 Regularization or LASSO

L1 Regularization technique is also known as LASSO or Least Absolute Shrinkage and Selection Operator. In this, the penalty term added to the cost function is the summation of absolute values of the coefficients. Since the absolute value of the coefficients is used, it can reduce the coefficient to 0 and such features may completely get discarded in LASSO. Thus, we can say, LASSO helps in Regularization as well as Feature Selection.

Following is the equation of Cost function with L1 penalty term:

Prevent Overfitting Using Regularization Techniques lasso

Cost Function after adding L1 Penalty (Source – Personal Computer)

Here, alpha is the multiplier term.

L2 Regularization or Ridge

L2 Regularization technique is also known as Ridge. In this, the penalty term added to the cost function is the summation of the squared value of coefficients. Unlike the LASSO term, the Ridge term uses squared values of the coefficient and can reduce the coefficient value near to 0 but not exactly 0. Ridge distributes the coefficient value across all the features.

Following is the equation of Cost function with L2 penalty term:

Prevent Overfitting Using Regularization Techniques ridge

Cost Function after adding L2 Penalty (Source – Personal Computer)

Here, alpha is the multiplier term.

LASSO Regression

LASSO Regression is a linear model built by applying the L1 or LASSO penalty term. Let’s see how to build a LASSO regression model in Python.

Importing the Libraries

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Lasso

Importing the Dataset

df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

The dataset has been taken from Kaggle.

Drop Duplicates if Any

df_train = df_train.dropna()
df_test = df_test.dropna()

Specifying x_train, x_test, y_train, y_test variables for Regression

Building LASSO Regression Model

lasso = Lasso()

Fitting the Model on Train Set

lasso.fit(x_train, y_train)

Calculating Train RMSE for Lasso Regression

print("Lasso Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, lasso.predict(x_train))), 5))

Calculating Test RMSE for Lasso Regression

print("Lasso Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, lasso.predict(x_test))), 5))

Putting it all together

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Lasso

df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

df_train = df_train.dropna()
df_test = df_test.dropna()

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

lasso = Lasso()

lasso.fit(x_train, y_train)
print("Lasso Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, lasso.predict(x_train))), 5))
print("Lasso Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, lasso.predict(x_test))), 5))

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the LASSO model to find the appropriate alpha value using LassoCV or GridSearchCV.

Ridge Regression

Ridge Regression is a linear model built by applying the L2 or Ridge penalty term. Let’s see how to build a Ridge regression model in Python.

Importing the Libraries

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Ridge

Importing the Dataset

df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

The dataset has been taken from Kaggle.

Drop Duplicates if Any

df_train = df_train.dropna()
df_test = df_test.dropna()

Specifying x_train, x_test, y_train, y_test variables for Regression

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

Building Ridge Regression Model

ridge = Ridge()

Fitting the Model on Train Set

ridge.fit(x_train, y_train)

Calculating Train RMSE for Ridge Regression

print("Ridge Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, ridge.predict(x_train))), 5))

Calculating Test RMSE for Ridge Regression

print("Ridge Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, ridge.predict(x_test))), 5))

Putting it all together

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Ridge

df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

df_train = df_train.dropna()
df_test = df_test.dropna()

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

ridge = Ridge()

ridge.fit(x_train, y_train)
print("Ridge Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, ridge.predict(x_train))), 5))
print("Ridge Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, ridge.predict(x_test))), 5))

On executing this code, we get:

ridge Prevent Overfitting Using Regularization Techniques

Source – Personal Computer

We can tune the hyperparameters of the Ridge model to find the appropriate alpha value using RidgeCV or GridSearchCV.

ElasticNet Regression

ElasticNet Regression is a linear model built by applying both L1 and L2 penalty terms. Let’s see how to build an ElasticNet regression model in Python.

Importing the Libraries

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import ElasticNet

Importing the Dataset

df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

The dataset has been taken from Kaggle.

Drop Duplicates if Any

df_train = df_train.dropna()
df_test = df_test.dropna()

Specifying x_train, x_test, y_train, y_test variables for Regression

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

Building ElasticNet Regression Model

enet = ElasticNet()

Fitting the Model on Train Set

enet.fit(x_train, y_train)

Calculating Train RMSE for ElasticNet Regression

print("ElasticNet Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, enet.predict(x_train))), 5))

Calculating Test RMSE for ElasticNet Regression

print("ElasticNet Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, enet.predict(x_test))), 5))

Putting it all together

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import ElasticNet

df_train = pd.read_csv('train.csv')
df_test = pd.read_csv('test.csv')

df_train = df_train.dropna()
df_test = df_test.dropna()

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

enet = ElasticNet()

enet.fit(x_train, y_train)

print("ElasticNet Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, enet.predict(x_train))), 5))
print("ElasticNet Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, enet.predict(x_test))), 5))

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the Ridge model to find the appropriate alpha value using ElasticNetCV or GridSearchCV.

Conclusions

In this article, we learned about Overfitting in linear models and Regularization to avoid this problem. We learned about L1 and L2 penalty terms that get added into the cost function. We looked at three regression algorithms based on L1 and L2 Regularization techniques. We can set specify several hyperparameters in each of these algorithms. To find the optimal hyperparameters, we can use GridSearchCV or relevant Hyperparameter Tuning algorithms of that respective regression model. One can try comparing the performance of these algorithms on a dataset to check which algorithm performed better using a performance metric such as Root Mean Square Error or RMSE.