How to do Cat and Dog Classification using CNN?

Khushi Shah 13 Feb, 2024

7 min read

Get ready for a thrilling adventure in the realm of computer vision! Our beginner-friendly project involves training a Convolutional Neural Network (CNN) to distinguish between cats and dogs in images. We’ll use a dataset containing images of both animals as our training data. Throughout this journey, we’ll manipulate data frames, visualize images using subplots, load images with imread, display them with imshow, and implement dropout regularization for better model performance. Let’s embark on this exciting journey of image classification together!

Learning Objectives

Gain a foundational understanding of Convolutional Neural Networks (CNNs) and their role in image classification tasks.
Learn how to preprocess and augment image data for training CNNs using libraries like Keras and TensorFlow.
Explore the architecture of a CNN, including convolutional, activation, pooling, and fully connected layers, and their significance in feature extraction and classification.
Develop practical skills in building and training a CNN model to differentiate between cat and dog images using the Dogs vs Cats dataset.

This article was published as a part of the Data Science Blogathon

Cat and Dog Classification using CNN

A Convolutional Neural Network (CNN) operates by applying convolutional layers, utilizing operations like conv2d to convolve learned filters (kernels) with input images. These filters assign weights and biases to different aspects of the image, aiding in feature extraction. During training, batches of labeled images are fed into the network. We compare predictions to ground truth labels using algorithms like argmax to determine the class with the highest probability. We apply batch normalization to enhance learning by normalizing the input across batches. The network parameters are adjusted iteratively to minimize the distance between predictions and labels. This process repeats for each batch, gradually improving the network’s prediction capabilities.

Dogs vs. Cats Prediction Problem

This tutorial aims to create a system capable of recognizing cat and dog images. It analyzes input images of cats and images of dogs to make predictions. The implemented model is adaptable for websites or mobile devices. The Dogs vs Cats dataset, available on Kaggle, comprises images for the model to learn distinctive features. After training, the classification model distinguishes between cat and dog images.

Also Read: Top 25 Machine Learning Projects for Beginners in 2024

Installing Required Packages for Python 3.6

Numpy -> 1.14.4 [ Image is read and stored in a NumPy array ]
TensorFlow -> 1.8.0 [ Tensorflow is the backend for Keras ]
Keras -> 2.1.6 [ Keras is used for implementing the CNN ]

Import Libraries

NumPy- For working with arrays, linear algebra.
Pandas – For reading/writing data
Matplotlib – to display images
TensorFlow Keras models – Need a model to predict right !!
TensorFlow Keras layers – Every NN needs layers and CNN needs well a couple of layers.

import pandas as pd
import numpy as np
import os 
import matplotlib.pyplot as plt
from os import listdir
from sklearn import metrics
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten

CNN does the processing of Images with the help of matrixes of weights known as filters. They detect low-level features like vertical and horizontal edges etc. Through each layer, the filters recognize high-level features.

We first initialize the CNN:

#initializing the cnn
classifier=Sequential()

For compiling the CNN, we are using adam optimizer.

Adaptive Moment Estimation (Adam) is a method used for computing individual learning rates for each parameter. For loss function, we are using Binary cross-entropy to compare the class output to each of the predicted probabilities. Then it calculates the penalization score based on the total distance from the expected value.

Image augmentation is a method of applying different kinds of transformation to original images resulting in multiple transformed copies of the same image. The images are different from each other in certain aspects because of shifting, rotating, flipping techniques. So, we are using the Keras ImageDataGenerator class to augment our images.

#part2-fitting the cnn to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

We need a way to turn our images into batches of data arrays in memory so that they can be fed to the network during training. ImageDataGenerator can readily be used for this purpose. So, we import this class and create an instance of the generator. We are using Keras to retrieve images from the disk with the flow_from_directory method of the ImageDataGenerator class.

# Generating images for the Test set
test_datagen = ImageDataGenerator(rescale = 1./255)
# Creating training set
training_set = train_datagen.flow_from_directory('C:/Users/khushi shah/AndroidStudioProjects/catanddog/dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')
# Creating the Test set
test_set = test_datagen.flow_from_directory('C:/Users/khushi shah/AndroidStudioProjects/catanddog/dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Also Read: 25 Open Datasets for Deep Learning Every Data Scientist Must Work With!

Convolution

Convolution involves linearly multiplying weights with the input. This multiplication occurs between an array of input data and a 2D array of weights called a filter or kernel. The filter is consistently smaller than the input data, and the dot product takes place between the input and filter array.

Activation

We add the activation function to assist the Artificial Neural Network (ANN) in learning complex patterns within the data. The primary purpose of the activation function is to introduce non-linearity into the neural network.

Pooling

The pooling operation provides spatial variance making the system capable of recognizing an object with some varied appearance. It involves adding a 2Dfilter over each channel of the feature map and thus summarise features lying in that region covered by the filter.

So, pooling basically helps reduce the number of parameters and computations present in the network. It progressively reduces the spatial size of the network and thus controls overfitting. There are two types of operations in this layer; Average pooling and Maximum pooling. Here, we are using max-pooling which according to its name will only take out the maximum from a pool. With the help of filters sliding through the input and at each stride, the maximum parameter is taken out, and the rest are dropped.

The pooling layer does not modify the depth of the network unlike in the convolution layer.

Fully Connected

The fully connected layer receives the flattened output from the final pooling layer.

The Full Connection process practically works as follows:

The neurons present in the fully connected layer detect a certain feature and preserves its value then communicates the value to both the dog and cat classes who then check out the feature and decide if the feature is relevant to them.

Full CNN Overview

#step1-convolution
classifier.add(Convolution2D(32,3,3,input_shape=(64,64,3),activation='relu'))
#step2-maxpooling
classifier.add(MaxPooling2D(pool_size=(2,2)))
#step3-flattening
classifier.add(Flatten())
#step4-fullconnection
classifier.add(Dense(output_dim=128,activation='relu'))
classifier.add(Dense(output_dim=1,activation='sigmoid'))

We are fitting our model to the training set. It will take some time for this to finish.

classifier.fit_generator(training_set,samples_per_epoch=8000,nb_epoch=25,validation_data=test_set,nb_val_samples=2000)

It is seen that we have 0.8115 accuracies on our training set.

We can predict new images with our model by predict_image function where we have to provide a path of new image as image path and using predict method. If the probability is more than 0.5 then the image will be of a dog else of cat.

#to predict new images 
def predict_image(imagepath, classifier):
    predict = image.load_img(imagepath, target_size = (64, 64))   
    predict_modified = image.img_to_array(predict)
    predict_modified = predict_modified / 255
    predict_modified = np.expand_dims(predict_modified, axis = 0)
    result = classifier.predict(predict_modified)
    if result[0][0] >= 0.5:
        prediction = 'dog'
        probability = result[0][0]
        print ("probability = " + str(probability))
    else:
        prediction = 'cat'
        probability = 1 - result[0][0]
        print ("probability = " + str(probability))
        print("Prediction = " + prediction)

Features Provided

We can test our own images and verify the accuracy of the model.
We can integrate the code directly into our other project and extend it into a website or mobile application device.
We can extend the project to different entities by just finding the suitable dataset, change the dataset and train the model accordingly.

Conclusion

In this exhilarating journey through the realm of image classification, we delved into the marvels of Convolutional Neural Networks (CNN). From discerning between cats and dogs to installing essential Python packages, we’ve left no stone unturned. This beginner-friendly project provides invaluable insights and sets the stage for exploring diverse applications. With a solid understanding of CNN fundamentals, you’re now ready to embark on your own image classification escapades! Don’t forget to leverage techniques like softmax activation and model.predict to further enhance your models and you can overlook key metrics like validation loss (val_loss) to assess model performance accurately.

Key Takeaways

CNNs are essential deep learning models for image classification, capable of automatically learning features from raw pixel data.
Preprocessing and augmenting image data are crucial steps in CNN training, enhancing model generalization and performance.
Understanding the components of a CNN, such as convolutional layers and activation functions, is vital for designing effective neural network architectures.
Practical applications of CNNs extend beyond cat and dog classification, encompassing various domains like medical imaging, object detection, and natural language processing.

Frequently Asked Questions

Q1. Why is Adam the most popular optimizer in Deep Learning?

A. Adam is popular in deep learning due to its adaptive learning rate and momentum features, improving optimization efficiency.

Q2. How to do Cat and Dog Classification using CNN?

A. Cat and Dog Classification using CNN involves training a convolutional neural network on labeled cat and dog image data to differentiate between the two classes.

Q3. What is Transfer Learning?

A. In transfer learning, practitioners transfer knowledge from a pre-trained model to a new model, usually achieved by retraining the output layer on new data.

Q4. Do you have any tutorial that I can follow step by step to generate the Class activation map?

A. Generating Class Activation Maps involves visualizing which parts of an image are important for classification, often done by appending a global average pooling layer and visualizing activations.

Q5. How would I predict the images in the test1 data set?

A. To predict images in the test1 dataset, use a trained model on test data, typically resizing images to match training image size, then generating predictions, often with libraries like PyTorch. Detailed tutorials are available on platforms like GitHub.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.