Learn everything about Analytics

Home » Build ResNet from Scratch With Python !

Build ResNet from Scratch With Python !

This article was published as a part of the Data Science Blogathon


There have been a series of breakthroughs in the field of Deep Learning and Computer Vision. Especially with the introduction of very deep Convolutional neural networks, these models helped achieve state-of-the-art results on problems such as image recognition and image classification.

So, over the years, the deep learning architectures became deeper and deeper (adding more layers) to solve more and more complex tasks which also helped in improving the performance of classification and recognition tasks and also making them robust.

But when we go on adding more layers to the neural network, it becomes very much difficult to train and the accuracy of the model starts saturating and then degrades also. Here comes the ResNet to rescue us from that scenario, and helps to resolve this problem.

What is ResNet?

Residual Network (ResNet) is one of the famous deep learning models that was introduced by Shaoqing Ren, Kaiming He, Jian Sun, and Xiangyu Zhang in their paper. The paper was named “Deep Residual Learning for Image Recognition” [1] in 2015. The ResNet model is one of the popular and most successful deep learning models so far.

Residual Blocks

The problem of training very deep networks has been relieved with the introduction of these Residual blocks and the ResNet model is made up of these blocks.


residual block | ResNet
Source: ‘Deep Residual Learning for Image Recognition‘ paper

The problem of training very deep networks has been relieved with the introduction of these Residual blocks and the ResNet model is made up of these blocks.

In the above figure, the very first thing we can notice is that there is a direct connection that skips some layers of the model. This connection is called ’skip connection’ and is the heart of residual blocks. The output is not the same due to this skip connection. Without the skip connection, input ‘X gets multiplied by the weights of the layer followed by adding a bias term.

Then comes the activation function, f() and we get the output as H(x).

H(x)=f( wx + b ) or H(x)=f(x)

Now with the introduction of a new skip connection technique, the output is H(x) is changed to


But the dimension of the input may be varying from that of the output which might happen with a convolutional layer or pooling layers. Hence, this problem can be handled with these two approaches:

· Zero is padded with the skip connection to increase its dimensions.

· 1×1 convolutional layers are added to the input to match the dimensions. In such a case, the output is:


Here an additional parameter w1 is added whereas no additional parameter is added when using the first approach.

These skip connections technique in ResNet solves the problem of vanishing gradient in deep CNNs by allowing alternate shortcut path for the gradient to flow through. Also, the skip connection helps if any layer hurts the performance of architecture, then it will be skipped by regularization.

Architecture of ResNet

There is a 34-layer plain network in the architecture that is inspired by VGG-19 in which the shortcut connection or the skip connections are added. These skip connections or the residual blocks then convert the architecture into the residual network as shown in the figure below.

architecture of resnet

Source: ‘Deep Residual Learning for Image Recognition‘ paper


Using ResNet with Keras:

Keras is an open-source deep-learning library capable of running on top of TensorFlow. Keras Applications provides the following ResNet versions.

– ResNet50

– ResNet50V2

– ResNet101

– ResNet101V2

– ResNet152

– ResNet152V2

Let’s Build ResNet from scratch:


resnet layers

  Source: ‘Deep Residual Learning for Image Recognition‘ paper

Let us keep the above image as a reference and start building the network.

ResNet architecture uses the CNN blocks multiple times, so let us create a class for CNN block, which takes input channels and output channels. There is a batchnorm2d after each conv layer.

import torch
import torch.nn as nn
class block(nn.Module):
    def __init__(
        self, in_channels, intermediate_channels, identity_downsample=None, stride=1
        super(block, self).__init__()
        self.expansion = 4
        self.conv1 = nn.Conv2d(
            in_channels, intermediate_channels, kernel_size=1, stride=1, padding=0, bias=False
        self.bn1 = nn.BatchNorm2d(intermediate_channels)
        self.conv2 = nn.Conv2d(
        self.bn2 = nn.BatchNorm2d(intermediate_channels)
        self.conv3 = nn.Conv2d(
            intermediate_channels * self.expansion,
        self.bn3 = nn.BatchNorm2d(intermediate_channels * self.expansion)
        self.relu = nn.ReLU()
        self.identity_downsample = identity_downsample
        self.stride = stride

    def forward(self, x):
        identity = x.clone()

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.conv3(x)
        x = self.bn3(x)

        if self.identity_downsample is not None:
            identity = self.identity_downsample(identity)

        x += identity
        x = self.relu(x)
        return x

Then create a ResNet class that takes the input of a number of blocks, layers, image channels, and the number of classes.

In the below code the function ‘_make_layer’
creates the ResNet layers, which takes the input of blocks, number of residual
blocks, out channel, and strides.

class ResNet(nn.Module):
    def __init__(self, block, layers, image_channels, num_classes):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# Essentially the entire ResNet architecture are in these 4 lines below
self.layer1 = self._make_layer(
block, layers[0], intermediate_channels=64, stride=1
self.layer2 = self._make_layer(
block, layers[1], intermediate_channels=128, stride=2
self.layer3 = self._make_layer(
block, layers[2], intermediate_channels=256, stride=2
self.layer4 = self._make_layer(
block, layers[3], intermediate_channels=512, stride=2

self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * 4, num_classes)

def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)

x = self.avgpool(x)
x = x.reshape(x.shape[0], -1)
x = self.fc(x)

return x

def _make_layer(self, block, num_residual_blocks, intermediate_channels, stride):
identity_downsample = None
layers = []

# Either if we half the input space for ex, 56x56 -> 28x28 (stride=2), or channels changes
# we need to adapt the Identity (skip connection) so it will be able to be added
# to the layer that's ahead

if stride != 1 or self.in_channels != intermediate_channels * 4:
identity_downsample = nn.Sequential(
intermediate_channels * 4,
nn.BatchNorm2d(intermediate_channels * 4),

block(self.in_channels, intermediate_channels, identity_downsample, stride)

# The expansion size is always 4 for ResNet 50,101,152
self.in_channels = intermediate_channels * 4

# For example for first resnet layer: 256 will be mapped to 64 as intermediate layer,
# then finally back to 256. Hence no identity downsample is needed, since stride = 1,
# and also same amount of channels.
for i in range(num_residual_blocks - 1):
layers.append(block(self.in_channels, intermediate_channels))

return nn.Sequential(*layers)

Then define different versions of ResNet

For ResNet50 the layer sequence is [3, 4, 6, 3].

For ResNet101 the layer sequence is [3, 4, 23, 3].

For ResNet152 the layer sequence is [3, 8, 36, 3]. (refer the Deep Residual Learning for Image Recognition‘ paper)

def ResNet50(img_channel=3, num_classes=1000):
    return ResNet(block, [3, 4, 6, 3], img_channel, num_classes)
def ResNet101(img_channel=3, num_classes=1000):
return ResNet(block, [3, 4, 23, 3], img_channel, num_classes)

def ResNet152(img_channel=3, num_classes=1000):
return ResNet(block, [3, 8, 36, 3], img_channel, num_classes)

Then write a small test code to check whether the model is working fine.

def test():
    net = ResNet101(img_channel=3, num_classes=1000)
    device = "cuda" if torch.cuda.is_available() else "cpu"
    y = net(torch.randn(4, 3, 224, 224)).to(device)

For the above test case the output should be:

exit code

The entire code can be accessed here:


[1]. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun: Deep Residual Learning for Image Recognition, Dec 2015, DOI: https://arxiv.org/abs/1512.03385

Thank you.

Your suggestions and doubts are welcomed here in the comment section. Thank you for reading my article!

You can also read this article on our Mobile APP Get it on Google Play