All you need to know about Convolutional Neural Networks!

Swapnil Vishwakarma Last Updated : 21 Jul, 2021

7 min read

This article was published as a part of the Data Science Blogathon

What is CNN, Why is it important

Ever since I learned about CNN, it has become one of my favorite topics in Deep Learning, so in this article, I am going to explain everything related to CNN. “Using this we can provide machines with vision. Now vision, I believe is one of the most important senses that we pose. Sighted people rely on vision for everyday tasks such as navigation, recognize objects, recognize complex human emotions and behavior” these are some of the words of Professor Alexander Amini from ‘MIT Introduction to Deep Learning 6.S191’. Since CNN has been specifically designed for visual tasks (say object recognition) so I can say that CNN is a type of Neural Network that is most often applied to image processing problems.

It can be used to detect and recognize faces, can be used in the medical sector to classify various diseases instantly, another example where CNN can be used is autonomous/self-driving cars.

Biological Inspiration

Photo by MIKHAIL VASILYEV on Unsplash

Let’s try to understand this from the historical perspective first. Hubel and Wiesel, two neuroscientists who got their Nobel prize in medicine mostly focused their research on visual tasks, so in an experiment done in 1959, they anesthetized a cat and inserted a microelectrode into its primary visual cortex. They then project a series of light bars on the screen installed in front of the cat. Interestingly they observed that some neurons showed some activity when this light bar was presented at a specific angle while other neurons got activated to some different angle of light bars. What they found was that there were different neurons for different tasks such as edge detection, motion detection, depth detection, and so on. CNN is inspired by this biological idea and research done on this area.

Having known the basic idea of CNN le us try to understand it in more detail.

Fundamentals of CNN

1. Convolution

As you might have guessed from the name itself convolution convols/merge two functions or information to produce a third function/information.

convolution operation | Convolutional neural networks

Let’s consider this gray-scaled image of 6×6 dimensions. Since the machine can only understand binary language, this image would appear to be a 6×6 matrix with all 0s on the left half and all 255 on the right half, since it is a grayscale image. The RHS 3×3 matrix is known as kernel/ filter/ mask/ operator, and the * operator is a convolution operator (here it is Sobel edge detector). We now have a 4×4 matrix after doing component-wise multiplication and addition. When we normalize this matrix, we receive an image with an edge highlighted in it.

Formula – (n x n) * (k x k) => (n – k + 1) x (n – k + 1)

2. Padding (p)

What if we want the output matrix to have the same dimensions as the input matrix i.e. (nxn). If you substitute n = 6 from the above formula, you will get an output matrix of size 4×4. If you want the output matrix to be of size 6×6 then it is quite obvious that the input matrix should be of size 8×8, so to change the dimensions of the input matrix, there is a concept of padding. If a pad one extra layer to each side then the initial dimension would be increased by 2 i.e. now n = 8 which will give the output matrix dimension to be 6.

Now the question arises what value should I fill this extra layer with?

There are two approaches to it:

Zero – Padding
Same-value Padding

Now the above formula becomes (n x n) * (k x k) with padding ‘p’ => (n – k + 2p + 1) x (n – k + 2p + 1)

3. Stride (s)

Basically, the stride is nothing but shifting a kernel matrix over the input matrix by a specific number of cells at a time. It helps to reduce the size of the output matrix.

Formula – (n x n) * (k x k) with stride = ‘s’ => ((n – k) // s + 1) x ((n – k) // s + 1)

Finally, the formula combining all these would be: (n x n) * (k x k), padding = p, stride = s ⇒ (((n – k + 2p) // s) + 1) x (((n – k + 2p) // s) + 1)

NOTE: Here // (written in python) is equivalent to the mathematical floor value after division.

Convolution in Color Images

So far we have looked at the convolution operation, padding, and stride on a greyscale image, what if the image is a colorful image? A color image can be represented as a 3D tensor since it has 3 matrices of Red, Green, and Blue stacked on top of each other. These RGB matrices are often referred to as 3 channels since most of the concepts in image processing are derived from signal processing in electronics and telecommunications major. Hence we can represent this 3D tensor as NxMxC where C is the no. of channels which is equal to 3 for color images.

Similar to the convolution in 2D matrices, convolution in 3D matrices is also component-wise multiplication followed by addition. One thing to keep in mind here is that the kernel should also have the same number of channels as the input matrix.

Formula: (N x N x C) * (K x K x C) ⇒ (N – K + 1) x (N – K + 1) x 1

NOTE: Convolution of a 3D matrix results in a 2D matrix.

Convolution layer

Unlike image processing where we use predefined kernels like the Sobel edge detector, in CNN we try to learn these kernels.

Here in CNN, we have multiple hyperparameters to play with, like kernel size K, padding p, stride s, and the number of kernels M.

In a real-world scenario, we have a series of convolution layers to train first the low-level features like edges, then mid-level features, then high-level features to get a fairly accurate model.

Max-pooling

It is again a biologically inspired concept that introduces some invariance in the model. For example, the model should be in a position to detect a face in an input image no matter where it is located and no matter what its size is. This is achieved using pooling layers.

This is similar to the kernel we have seen above, here also we can apply the concept of strides, kernel size, etc.

Here in this example, we have a 2 x 2 max-pool kernel with stride as 2. It selects the maximum value from a patch. Another pooling is the mean/average pooling where we get the average value instead of the maximum value.

Various Convolutional Neural Networks:

LeNet, 1998
AlexNet, 2012
VGGNet, 2014
ResNet, 2015
Inception Network, 2015

Before we move on to the code part, it is important that we understand what Data Augmentation and Transfer Learning mean.

Data Augmentation

Since everyone wants their model to be a robust model but creating such a model would require a huge amount of data so Data Augmentation comes to the rescue here. It basically means adding more data to our dataset by rotating, scaling, cropping, flipping, shifting horizontally or vertically, zooming, stretching (shear operation), illumination conditions, etc on images (since CNN) to create more artificial data to train on.

Transfer Learning

The main idea is to use a pre-trained model (which is trained on some dataset ‘X) on a dataset ‘Y’ without training it from scratch. Both in Keras and Tensorflow we have some pre-trained models like VGG16 that is trained on one of the largest object classification datasets ImageNet contained 1000 different categories and the total dataset size is 150GB.

Here we can use transfer learning in various ways like using Bottleneck features (i.e. taking output just at the flattening layer) or by freezing the initial layer that detects only edges, shapes, etc. and changing/learning the last few layers according to the new dataset or use the pre-trained model as the initial model to fine-tune the complete model based on the new dataset.

Let’s Code

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

References

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Swapnil Vishwakarma

Hello there! 👋🏻 My name is Swapnil Vishwakarma, and I'm delighted to meet you! 🏄‍♂️

I've had some fantastic experiences in my journey so far! I worked as a Data Science Intern at a start-up called Data Glacier, where I had the opportunity to delve into the fascinating world of data. I also had the chance to be a Python Developer Intern at Infigon Futures, where I honed my programming skills. Additionally, I worked as a research assistant at my college, focusing on exciting applications of Artificial Intelligence. ⚗️👨‍🔬

During the lockdown, I discovered my passion for Machine Learning, and I eagerly pursued a course on Machine Learning offered by Stanford University through Coursera. Completing that course empowered me to apply my newfound knowledge in real-world settings through internships. Currently, I'm proud to be an AWS Community Builder, where I actively engage with the AWS community, share knowledge, and stay up to date with the latest advancements in cloud computing.

Aside from my professional endeavors, I have a few hobbies that bring me joy. I love swaying to the beats of Punjabi songs, as they uplift my spirits and fill me with energy! 🎵 I also find solace in sketching and enjoy immersing myself in captivating books, although I wouldn't consider myself a bookworm. 🐛

Feel free to ask me anything or engage in a friendly conversation! I'm here to assist you in English. 😊

Beginner Computer Vision Deep Learning Python Unstructured Data

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Model Deployment

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Zero and Few Shot Learning

All you need to know about Convolutional Neural Networks!

Table of Contents:

What is CNN, Why is it important

Biological Inspiration

Fundamentals of CNN

1. Convolution

2. Padding (p)

3. Stride (s)

Convolution in Color Images

Convolution layer

Max-pooling

Various Convolutional Neural Networks:

Data Augmentation

Transfer Learning

Let’s Code

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory