# Prevent Overfitting Using Regularization Techniques

Rahul Shah 04 Aug, 2022 • 7 min read

This article was published as a part of the Data Science Blogathon

## Introduction

Model overfitting is a serious problem and can cause the model to produce misleading information. One of the techniques to overcome overfitting is Regularization. Regularization, in general, penalizes the coefficients that cause the overfitting of the model. There are two norms in regularization that can be used as per the scenarios.

In this article, we will learn about Regularization, the two norms of Regularization, and the Regression techniques based on these Regularization techniques.

Image by Victor Freitas from Pexels

1. Overfitting and Regularization
2. L1 Regularization or LASSO
3. L2 Regularization or Ridge
4. LASSO Regression
5. Ridge Regression
6. ElasticNet Regression
7. Conclusions

## Overfitting and Regularization

Overfitting of the model occurs when the model learns just ‘too-well’ on the train data. This would sound like an advantage but it is not. When a model is overtrained on training data, it performs worst on the test data or any new data provided. Technically, the model learns the details as well as the noise of the train data. This would hinder the performance of any new data provided to the model as the learned details and noise cannot be applied to the new data. This is the case when we say the performance of the model is not adequate. There are several ways of avoiding the overfitting of the model such as K-fold cross-validation, resampling, reducing the number of features, etc. One of the ways is to apply Regularization to the model. Regularization is a better technique than Reducing the number of features to overcome the overfitting problem as in Regularization we do not discard the features of the model.

Regularization is a technique that penalizes the coefficient. In an overfit model, the coefficients are generally inflated. Thus, Regularization adds penalties to the parameters and avoids them weigh heavily. The coefficients are added to the cost function of the linear equation. Thus, if the coefficient inflates, the cost function will increase. And Linear regression model will try to optimize the coefficient in order to minimize the cost function.

Image by Nicoguaro

Practically, you can check if the regression model is overfitting or not by RMSE. A good model has a similar RMSE for the train and test sets. If the difference is too large, we can say the model is overfitting to the training set. There are two kinds of techniques for adding penalities to the cost function, L1 Norm or LASSO term and L2 Norm or Ridge Term.

## L1 Regularization or LASSO

L1 Regularization technique is also known as LASSO or Least Absolute Shrinkage and Selection Operator. In this, the penalty term added to the cost function is the summation of absolute values of the coefficients. Since the absolute value of the coefficients is used, it can reduce the coefficient to 0 and such features may completely get discarded in LASSO. Thus, we can say, LASSO helps in Regularization as well as Feature Selection.

Following is the equation of Cost function with L1 penalty term:

Cost Function after adding L1 Penalty (Source – Personal Computer)

Here, alpha is the multiplier term.

## L2 Regularization or Ridge

L2 Regularization technique is also known as Ridge. In this, the penalty term added to the cost function is the summation of the squared value of coefficients. Unlike the LASSO term, the Ridge term uses squared values of the coefficient and can reduce the coefficient value near to 0 but not exactly 0. Ridge distributes the coefficient value across all the features.

Following is the equation of Cost function with L2 penalty term:

Cost Function after adding L2 Penalty (Source – Personal Computer)

Here, alpha is the multiplier term.

## LASSO Regression

LASSO Regression is a linear model built by applying the L1 or LASSO penalty term. Let’s see how to build a LASSO regression model in Python.

Importing the Libraries

```import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Lasso```

Importing the Dataset

```df_train = pd.read_csv('train.csv')

The dataset has been taken from Kaggle.

Drop Duplicates if Any

```df_train = df_train.dropna()
df_test = df_test.dropna()```

Specifying x_train, x_test, y_train, y_test variables for Regression

Building LASSO Regression Model

`lasso = Lasso()`

Fitting the Model on Train Set

`lasso.fit(x_train, y_train)`

Calculating Train RMSE for Lasso Regression

`print("Lasso Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, lasso.predict(x_train))), 5))`

Calculating Test RMSE for Lasso Regression

`print("Lasso Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, lasso.predict(x_test))), 5))`

Putting it all together

```import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Lasso

df_train = df_train.dropna()
df_test = df_test.dropna()

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

lasso = Lasso()

lasso.fit(x_train, y_train)
print("Lasso Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, lasso.predict(x_train))), 5))
print("Lasso Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, lasso.predict(x_test))), 5))```

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the LASSO model to find the appropriate alpha value using LassoCV or GridSearchCV.

## Ridge Regression

Ridge Regression is a linear model built by applying the L2 or Ridge penalty term. Let’s see how to build a Ridge regression model in Python.

Importing the Libraries

```import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Ridge```

Importing the Dataset

```df_train = pd.read_csv('train.csv')

The dataset has been taken from Kaggle.

Drop Duplicates if Any

```df_train = df_train.dropna()
df_test = df_test.dropna()```

Specifying x_train, x_test, y_train, y_test variables for Regression

```x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)```

Building Ridge Regression Model

`ridge = Ridge()`

Fitting the Model on Train Set

`ridge.fit(x_train, y_train)`

Calculating Train RMSE for Ridge Regression

`print("Ridge Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, ridge.predict(x_train))), 5))`

Calculating Test RMSE for Ridge Regression

`print("Ridge Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, ridge.predict(x_test))), 5))`

Putting it all together

```import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import Ridge

df_train = df_train.dropna()
df_test = df_test.dropna()

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

ridge = Ridge()

ridge.fit(x_train, y_train)
print("Ridge Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, ridge.predict(x_train))), 5))
print("Ridge Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, ridge.predict(x_test))), 5))```

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the Ridge model to find the appropriate alpha value using RidgeCV or GridSearchCV.

## ElasticNet Regression

ElasticNet Regression is a linear model built by applying both L1 and L2 penalty terms. Let’s see how to build an ElasticNet regression model in Python.

Importing the Libraries

```import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import ElasticNet```

Importing the Dataset

```df_train = pd.read_csv('train.csv')

The dataset has been taken from Kaggle.

Drop Duplicates if Any

```df_train = df_train.dropna()
df_test = df_test.dropna()```

Specifying x_train, x_test, y_train, y_test variables for Regression

```x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)```

Building ElasticNet Regression Model

`enet = ElasticNet()`

Fitting the Model on Train Set

`enet.fit(x_train, y_train)`

Calculating Train RMSE for ElasticNet Regression

`print("ElasticNet Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, enet.predict(x_train))), 5))`

Calculating Test RMSE for ElasticNet Regression

`print("ElasticNet Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, enet.predict(x_test))), 5))`

Putting it all together

```import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.linear_model import ElasticNet

df_train = df_train.dropna()
df_test = df_test.dropna()

x_train = df_train['x']
x_train = x_train.values.reshape(-1,1)
y_train = df_train['y']
y_train = y_train.values.reshape(-1,1)

x_test = df_test['x']
x_test = x_test.values.reshape(-1,1)
y_test = df_test['y']
y_test = y_test.values.reshape(-1,1)

enet = ElasticNet()

enet.fit(x_train, y_train)

print("ElasticNet Train RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_train, enet.predict(x_train))), 5))
print("ElasticNet Test RMSE:", np.round(np.sqrt(metrics.mean_quared_error(y_test, enet.predict(x_test))), 5))```

On executing this code, we get:

Source – Personal Computer

We can tune the hyperparameters of the Ridge model to find the appropriate alpha value using ElasticNetCV or GridSearchCV.

## Conclusions

In this article, we learned about Overfitting in linear models and Regularization to avoid this problem. We learned about L1 and L2 penalty terms that get added into the cost function. We looked at three regression algorithms based on L1 and L2 Regularization techniques. We can set specify several hyperparameters in each of these algorithms. To find the optimal hyperparameters, we can use GridSearchCV or relevant Hyperparameter Tuning algorithms of that respective regression model. One can try comparing the performance of these algorithms on a dataset to check which algorithm performed better using a performance metric such as Root Mean Square Error or RMSE.

Connect with me on LinkedIn Here.

Check out my other Articles Here and on Medium