[email protected] — Published On January 28, 2022 and Last Modified On March 15th, 2022
Beginner Computer Vision Deep Learning Image

This article was published as a part of the Data Science Blogathon.

To understand Convolutional Neural networks, we first need to know What is Deep Learning?

Deep Learning is an emerging field of Machine learning; that is, it is a subset of Machine Learning where learning happens from past examples or experiences with the help of ‘Artificial Neural Networks’.

Deep Learning uses deep neural networks, where the word ‘deep’ signifies the presence of more than 1 or 2 hidden layers apart from the input and output layer.

What is an Artificial Neural Network?

Artificial neural networks are made up of neurons, which are the core processing units of the network. For better understanding, refer to the diagram below:

In the given diagram, first, we have the ‘INPUT LAYER’, where the neurons are fed with training observations. Then in between is the ‘HIDDEN LAYER‘ that performs most of the computations required by our network. Lastly, the ‘OUTPUT LAYER‘ predicts the final output extracted from the previous two layers.


source: researchgate.net

How does this neural network work?

  • For instance, if an image is passed as input, with N X N pixels, each pixel is fed as input to each neuron of the first layer.
  • Neurons of one layer are connected to the following layers through ‘channels’.
  • Each of these channels is assigned a numerical value called ‘weight’.
  • The inputs (x1, x2, …… xn) are multiplied by their corresponding weights, and their sum is sent to the neurons in the hidden layer.
  • Each of these neurons is associated with a numerical value called the ‘Bias’, further added to the input sum.
  • This value is then passed through a threshold function called the ‘Activation function’, which determines whether the particular neuron will get activated or not.
  • The activated neuron transmits data to neurons of the next layer over channels.
  • Thus, data is propagated through the network, and the neuron with the highest value determines the output.
  • Output= f(sigma w i*xi)+Bias ,where f is the activation function.

Types of Deep Neural Network:

  • Artificial Neural Network
  • Multi-Layered Perceptron
  • Recurrent Neural Network
  • Convolutional Neural Network


It is a class of deep neural networks that extracts features from images, given as input, to perform specific tasks such as image classification, face recognition and semantic image system. A CNN has one or more convolution layers for simple feature extraction, which execute convolution operation (i.e. multiplication of a set of weights with input) while retaining the critical features (spatial and temporal information) without human supervision.

Why do we need CNN over ANN?

CNN is needed as it is an important and more accurate way for image classification problems. With Artificial Neural Networks, a 2D image would first be converted into a 1-dimensional vector before training the model.

Also, with an increase in the size of the image, the number of training parameters would increase exponentially, resulting in loss of storage. Moreover, ANN cannot capture the sequential information required for sequence data.

Thus, CNN would always be a preferred way for dealing with 2D image classification problems because of its ability to deal with images as data, thereby providing higher accuracy.

The architecture of CNN:



source: medium


The three primary layers that define the structure of a convolutional neural network are:

1)Convolution layer:

This is the first layer of the convolutional network that performs feature extraction by sliding the filter over the input image. The output or the convolved feature is the element-wise product of filters in the image and their sum for every sliding action.

The output layer, also known as the feature map, corresponds to original images like curves, sharp edges, textures, etc.

In the case of networks with more convolutional layers, the initial layers are meant for extracting the generic features while the complex parts are removed as the network gets deeper.

The image below shows the convolution operation.

Convolutional Layer

source: analyticsindiamag.com


2)Pooling Layer:

The primary purpose of this layer is to reduce the number of trainable parameters by decreasing the spatial size of the image, thereby reducing the computational cost.

The image depth remains unchanged since pooling is done independently on each depth dimension. Max Pooling is the most common pooling method, where the most significant element is taken as input from the feature map. Max Pooling is then performed to give the output image with dimensions reduced to a great extent while retaining the essential information.

Pooling layer

source: computer science wiki


3)Fully Connected Layer:

The last few layers which determine the output are the fully connected layers. The output from the pooling layer is Flattened into a one-dimensional vector and then given as input to the fully connected layer.

The output layer has the same number of neurons as the number of categories we had in our problem for classification, thus associating features to a particular label.

After this process is known as forwarding propagation, the output so generated is compared to the actual production for error generation.

The error is then backpropagated to update the filters(weights) and bias values. Thus, one training is completed after this forwarding and backward propagation cycle.


Now, let’s implement CNN by taking an example of classifying an image as a dog or cat. Dataset can be downloaded from https://www.kaggle.com/c/dogs-vs-cats/data

#importing the necessary libraries
import cv2

import os

import numpy as np

import pandas as pd

import sklearn

import keras

from keras.models import Sequential

import tensorflow as tf

from keras.preprocessing.image import ImageDataGenerator

#Data Preprocessing

train_datagen = ImageDataGenerator(rescale = 1./255,

                                   shear_range = 0.2,

                                   zoom_range = 0.2,

                                   horizontal_flip = True)

train_generator = train_datagen.flow_from_directory(r"C:dogs vs catstrain",

                                                 target_size = (64,64),

                                                 batch_size = 32,

                                                 class_mode = 'binary')


validation_generator = test_datagen.flow_from_directory(r"C:dogs vs catstest",

                                            target_size = (64,64),

                                            batch_size = 32,

                                            class_mode = 'binary')

## Build the CNN Model

#initialize the model



cnn.add(tf.keras.layers.Conv2D(filters=32,kernel_size=3,activation='relu', input_shape=[64,64,3]))



#Adding one more Convolution layer


# Adding one more Pooling Layer




#Full Connection Layer


#Full Connection Layer


#compile the model




The above output shows that the number of trainable parameters is 813,217, which can be reduced by adding more convolutional and pooling layers. With the increase in the number of layers, the features extracted will be more specific.

cnn.fit(x= train_generator , validation_data=validation_generator,epochs=25)


Thus, we get accuracy up to 90%, which can further be increased by adding more layers before the fully connected layer.

We have performed 25 epochs; you can further increase the number of epochs to train your model.

Read more articles on our blog.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 

About the Author


Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *