Dropout Regularization in Deep Learning

Parthiban Marimuthu 21 Jun, 2024

5 min read

Introduction

When shopping for a shirt, one avoids overly tight fits that might become uncomfortable post-meal, or excessively loose ones that resemble hanging cloth. In machine learning projects, overfitting and underfitting are common issues. Regularization techniques address these problems by adjusting model complexity, such as using dropout or adjusting hyperparameters, ensuring the model fits the data appropriately without memorizing noise or being too simplistic.

In the below image, we are applying a dropout regularization in deep learning on the second hidden layer of a neuron network.

This article was published as a part of the Data Science Blogathon.

What’s Dropout?

In machine learning, “dropout” refers to the practice of disregarding certain nodes in a layer at random during training. A dropout regularization in deep learning is a regularization approach that prevents overfitting by ensuring that no units are codependent with one another.

Dropout Regularization

When you have training data, if you try to train your model too much, it might overfit, and when you get the actual test data for making predictions, it will not probably perform well. Dropout regularization is one technique used to tackle overfitting problems in deep learning.

That’s what we are going to look into in this blog, and we’ll go over some theories first, and then we’ll write python code using TensorFlow, and we’ll see how adding a dropout layer increases the performance of your neural network.

Training with Drop-Out Layers

Dropout is a regularization method approximating concurrent training of many neural networks with various designs. During training, the network randomly ignores or drops some layer outputs. This changes the layer’s appearance and connectivity compared to the preceding layer. In practice, each training update gives the layer a different perspective. Dropout makes the training process noisy, requiring nodes within a layer to take on more or less responsible for the inputs on a probabilistic basis.

According to this conception, Dropout in machine learning may break apart circumstances in which network tiers co-adapt to fix mistakes committed by prior layers, making the model more robust. Dropout is implemented per layer in a neural network. It works with the vast majority of layers, including dense, fully connected, convolutional, and recurrent layers such as the long short-term memory network layer. Dropout can occur on any or all of the network’s hidden layers as well as the visible or input layer. It is not used on the output layer.

Dropout Implementation

Using the torch. nn, you can easily add a Dropout in machine learning to your PyTorch models. The dropout class takes the dropout rate (the likelihood of deactivating a neuron) as a parameter.

self.dropout = nn.Dropout(0.25)

Dropout can be used after any non-output layer.

To investigate the impact of dropout, train an image classification model. I’ll start with an unregularized network and then use Dropout in machine learning to train a regularised network. The Cifar-10 dataset is used to train the models over 15 epochs.

A complete example of introducing dropout to a PyTorch model is provided.

class Net(nn.Module):
  def __init__(self, input_shape=(3,32,32)):
    super(Net, self).__init__()
    self.conv1 = nn.Conv2d(3, 32, 3)
    self.conv2 = nn.Conv2d(32, 64, 3)
    self.conv3 = nn.Conv2d(64, 128, 3)
    self.pool = nn.MaxPool2d(2,2)
    n_size = self._get_conv_output(input_shape)
    self.fc1 = nn.Linear(n_size, 512)
    self.fc2 = nn.Linear(512, 10)
    self.dropout = nn.Dropout(0.25)      
  def forward(self, x):
    x = self._forward_features(x)
    x = x.view(x.size(0), -1)
    x = self.dropout(x)
    x = F.relu(self.fc1(x))
    # Apply dropout
    x = self.dropout(x)
    x = self.fc2(x)
    return x

An unregularized network overfits instantly on the training dataset. Take note of how the validation loss for the no-dropout regularization in deep learning run diverges dramatically after only a few epochs. This explains why the generalization error has grown.
Overfitting is avoided by training with two dropout in deep learning layers and a dropout probability of 25%. However, this affects training accuracy, necessitating the training of a regularised network over a longer period.
Leaving improves model generalisation. Although the training accuracy is lower than that of the unregularized network, the total validation accuracy has improved. This explains why the generalization error has decreased.

Why will dropout help with overfitting?

It can’t rely on one input as it might be randomly dropped out.
Neurons will not learn redundant details of inputs

Other Popular Regularization Techniques

When combating overfitting, dropping out is far from the only choice. Regularization techniques commonly used include:

Early stopping: automatically terminates training when a performance measure (e.g., validation loss, accuracy) ceases to improve.
Weight decay: add a penalty to the loss function to motivate the network to utilize lesser weights.
Noise: Allow some random variations in the data through augmentation to create noise (which makes the network robust to a larger distribution of inputs and hence improves generalization).
Model Combination: the outputs of separately trained neural networks are averaged (which requires a lot of computational power, data, and time).

Dropout Regularization Hyperparameters

In deep learning regularization, researchers have found that using a high momentum and a large decaying learning rate are effective hyperparameter values with dropout. Limiting our weight vectors using dropout allows us to employ a high learning rate without fear of the weights blowing up. Dropout noise, along with our big decaying learning rate, allows us to explore alternative areas of our loss function and, hopefully, reach a better minimum.

The Drawbacks of Dropout

Although dropout is a potent tool, it has certain downsides. A dropout network may take 2-3 times longer to train than a normal network. Finding a regularize virtually comparable to a dropout layer is one method to reap the benefits of dropout in deep learning without slowing down training. This regularize is a modified variant of L2 regularization for linear regression. An analogous regularize for more complex models has yet to be discovered until that time when doubt drops out.

Conclusion

Computer vision systems usually never have enough training data; dropout is extremely common in computer vision applications. Convolutional neural networks are computer vision’s most widely used deep learning models. Dropout, on the other hand, is not particularly useful on convolutional layers. This is because dropout tries to increase robustness by making neurons redundant. Without relying on single neurons, a model should learn parameters. This is very helpful if your layer has a lot of parameters.

Key Takeaways :

As a result, convolutional neural networks often place dropout layers after fully connected layers but not after convolutional layers.
Other regularising techniques, such as batch normalization in convolutional networks, have largely overtaken dropout in recent years.
Because convolutional layers have fewer parameters, they necessitate less regularisation.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Frequently Asked Questions

Q1. What is dropout regularization?

A. In neural networks, dropout regularization prevents overfitting by randomly dropping a proportion of neurons during each training iteration, forcing the network to learn redundant representations.

Q2. What does 0.25 dropout mean?

A. A 0.25 dropout means randomly setting 25% of the neuron units to zero during training, effectively dropping them out of the network for that iteration.

Q3. What is the dropout layer used for?

A. In neural networks, the dropout layer improves generalization and prevents overfitting by randomly disabling a proportion of neurons during training, encouraging the network to learn more robust features.

Q4. How does dropout prevent overfitting?

A. Dropout prevents overfitting by reducing co-dependency among neurons, forcing the network to learn more robust features that are generalizable to unseen data. It acts as a form of ensemble learning within the network, enhancing performance on test data.