Beginners Guide to Convolutional Neural Network with Implementation in Python

Deepanshi Last Updated : 30 Sep, 2024

7 min read

We have learned about the Artificial Neural network and its application in the last few articles. This blog will be all about another Deep Learning model which is the Convolutional Neural Network. As always, this will be a beginner’s guide written in a way that newcomers to the data science field can easily understand the concept. So, keep reading! 🙂

This article was published as a part of the Data Science Blogathon

Introduction to CNN
Components of CNN
Conclusion
Frequently Asked Questions

Introduction to CNN

Convolutional Neural Network is a Deep Learning algorithm specially designed for working with Images and videos. It takes images as inputs, extracts and learns the features of the image, and classifies them based on the learned features.

This algorithm is inspired by the working of a part of the human brain which is the Visual Cortex. The visual Cortex is a part of the human brain which is responsible for processing visual information from the outside world. It consists of various layers, with each layer serving a specific function by extracting information from the image or visual. Ultimately, all the information gathered from each layer combines to interpret or classify the image or visual.

Similarly, CNN utilizes various filters, with each filter extracting specific information from the image, such as edges and different shapes (vertical, horizontal, round). These extracted features combine to help identify the image.

Now, the question here can be: Why can’t we use Artificial Neural Networks for the same purpose? This is because there are some disadvantages with ANN:

It is too much computation for an ANN model to train large-size images and different types of image channels.
The next disadvantage is that it is unable to capture all the information from an image whereas a CNN model can capture the spatial dependencies of the image.
Another reason is that ANN is sensitive to the location of the object in the image i.e if the location or place of the same object changes, it will not be able to classify properly.

Components of CNN

The CNN model works in two steps: feature extraction and Classification

Feature extraction is a phase where various filters and layers apply to the images to extract information and features. Once this process is complete, the extracted data moves to the next phase, classification, where it is classified based on the target variable of the problem.

A typical CNN model looks like this:

Input layer
Convolution layer + Activation function
Pooling layer
Fully Connected Layer

Convolutional Neural Network with Implementation in Python layers

Source: https://learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/

Let’s learn about each layer in detail.

Input layer

As the name says, it’s our input image and can be Grayscale or RGB. Every image is made up of pixels that range from 0 to 255. We need to normalize them i.e convert the range between 0 to 1 before passing it to the model.

Below is the example of an input image of size 4*4 and has 3 channels i.e RGB and pixel values.

Convolution Layer

The convolution layer applies the filter to the input image to extract or detect its features. The filter processes the image multiple times, creating a feature map that aids in classifying the input image. Let’s understand this with the help of an example. For simplicity, we will take a 2D input image with normalized pixels.

Convolutional Neural Network with Implementation in Python conv. layer

Understanding the Feature Map

In the above figure, we have an input image of size 6*6 and applied a filter of 3*3 on it to detect some features. In this example, we have applied only one filter but in practice, many such filters are applied to extract information from the image.

The result of applying the filter to the image is that we get a Feature Map of 4*4 which has some information about the input image. Many such feature maps are generated in practical applications.

Let’s get into some maths behind getting the feature map in the above image.

Convolutional Neural Network with Implementation in Python conv layer 2

As presented in the above figure, in the first step, the filter applies to the green highlighted part of the image. The pixel values of the image multiply with the filter values (as shown in the figure using lines) and then sum up to produce the final value.

In the next step, shift the filter by one column, as illustrated in the figure below. This movement to the next column or row is known as stride. In this example, we take a stride of 1, meaning we shift by one column.

Similarly, the filter passes over the entire image and we get our final Feature Map. Once we get the feature map, an activation function is applied to it for introducing nonlinearity.

A point to note here is that the Feature map we get is smaller than the size of our image. As we increase the value of stride the size of the feature map decreases.

Pooling Layer

The pooling layer follows the convolutional layer and reduces the dimensions of the feature map, helping to preserve important information and features of the input image while also reducing computation time.

Using pooling, a lower resolution version of input is created that still contains the large or important elements of the input image.

The most common types of Pooling are Max Pooling and Average Pooling. The below figure shows how Max Pooling works. Using the Feature map which we got from the above example to apply Pooling. Here we are using a Pooling layer of size 2*2 with a stride of 2.

The process takes the maximum value from each highlighted area, resulting in a new version of the input image with a size of 2×2. After applying pooling, the dimension of the feature map reduces.

Fully Connected Layer

Till now we have performed the Feature Extraction steps, now comes the Classification part. The Fully connected layer (as we have in ANN) is used for classifying the input image into a label. This layer connects the information extracted from the previous steps (i.e Convolution layer and Pooling layers) to the output layer and eventually classifies the input into the desired label.

The complete process of a CNN model can be seen in the below image.

Source: https://developersbreach.com/convolution-neural-network-deep-learning/

How to Implement CNN in Python?

We will be using the Mnist Digit classification dataset which we used in the last blog of Practical Implementation of ANN. Please refer to that first for a better understanding of the application of CNN.

#importing the required libraries
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPool2D
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
#loading data
(X_train,y_train) , (X_test,y_test)=mnist.load_data()
#reshaping data
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], X_train.shape[2], 1))
X_test = X_test.reshape((X_test.shape[0],X_test.shape[1],X_test.shape[2],1))
#checking the shape after reshaping
print(X_train.shape)
print(X_test.shape)
#normalizing the pixel values
X_train=X_train/255
X_test=X_test/255
#defining model
model=Sequential()model=Sequential()
#adding convolution layer
model.add(Conv2D(32,(3,3),activation=’relu’,input_shape=(28,28,1)))
#adding pooling layer
model.add(MaxPool2D(2,2))
#adding fully connected layer
model.add(Flatten())
model.add(Dense(100,activation=’relu’))
#adding output layer
model.add(Dense(10,activation=’softmax’))
#compiling the model
model.compile(loss=’sparse_categorical_crossentropy’,optimizer=’adam’,metrics=[‘accuracy’])
#fitting the model
model.fit(X_train,y_train,epochs=10)

Output:

CNN in python Convolutional Neural Network with Implementation in Python

#evaluting the model
model.evaluate(X_test,y_test)

Conclusion

This blog covers some important elements of CNN, but many topics remain, such as padding, data augmentation, and more details on stride. Since deep learning is a vast and ever-evolving field, I will discuss these topics in future blogs. I hope you found this article helpful and worth your time investing on.

In the next few blogs, you can expect a detailed implementation of CNN with explanations and concepts like Data augmentation and Hyperparameter tuning.

About the Author

I am Deepanshi Dhingra currently working as a Data Science Researcher, and possess knowledge of Analytics, Exploratory Data Analysis, Machine Learning, and Deep Learning. Feel free to content with me on LinkedIn for any feedback and suggestions.

Frequently Asked Questions

Q1. What is CNN in Python?

A. A Convolutional Neural Network (CNN) is a type of deep neural network used for image recognition and classification tasks in machine learning. Python libraries like TensorFlow, Keras, PyTorch, and Caffe provide pre-built CNN architectures and tools for building and training them on specific datasets.

Q2. What are the 4 types of CNN?

A. The four common types of Convolutional Neural Networks (CNNs) are LeNet, AlexNet, VGGNet, and ResNet. LeNet is the first CNN architecture designed for handwritten digit recognition. In contrast, AlexNet, VGGNet, and ResNet are deep CNNs that achieved top performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

Q3. What is tensorflow in python?

A. TensorFlow is an open-source machine learning and artificial intelligence library developed by Google Brain Team. It is written in Python and provides high-level APIs like Keras, as well as low-level APIs, for building and training machine learning models. TensorFlow also offers tools for data preprocessing, visualization, and distributed computing.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Deepanshi

Advanced Computer Vision Deep Learning Image Image Analysis

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Model Deployment

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Zero and Few Shot Learning

Beginners Guide to Convolutional Neural Network with Implementation in Python

Table of contents

Introduction to CNN

Components of CNN

Input layer

Convolution Layer

Understanding the Feature Map

Pooling Layer

Fully Connected Layer

How to Implement CNN in Python?

Conclusion

About the Author

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap