Convolutional neural networks (CNN) – the concept behind recent breakthroughs and developments in deep learning.
CNNs have broken the mold and ascended the throne to become the state-of-the-art computer vision technique. Among the different types of neural networks (others include recurrent neural networks (RNN), long short term memory (LSTM), artificial neural networks (ANN), etc.), CNNs are easily the most popular.
These convolutional neural network models are ubiquitous in the image data space. They work phenomenally well on computer vision tasks like image classification, object detection, image recognition, etc.
So – where can you practice your CNN skills? Well, you’ve come to the right place!
There are various datasets that you can leverage for applying convolutional neural networks. Here are three popular datasets:
In this article, we will be building image classification models using CNN on each of these datasets. That’s right! We will explore MNSIT, CIFAR-10, and ImageNet to understand, in a practical manner, how CNNs work for the image classification task.
You can learn all about Convolutional Neural Networks(CNN) in this free course: Convolutional Neural Networks (CNN) from Scratch
My inspiration for writing this article is to help the community apply theoretical knowledge in a practical manner. This is a very important exercise as it not only helps you build a deeper understanding of the underlying concept but will also teach you practical details that can only be learned through implementing the concept.
If you’re new to the world of neural networks, CNNs, image classification, I recommend going through these excellent in-depth tutorials:
- Introduction to Neural Networks (Free Course!)
- Demystifying the Mathematics behind Convolutional Neural Networks (CNNs)
- Build your First Image Classification Model in just 10 Minutes
And if you’re looking to learn computer vision and deep learning in-depth, you should check out our popular course:
Table of Contents
- Using CNNs to Classify Hand-written Digits on MNIST Dataset
- Identifying Images from CIFAR-10 Dataset using CNNs
- Categorizing Images of ImageNet Dataset using CNNs
- Where to go from here?
Note: I will be using Keras to demonstrate image classification using CNNs in this article. Keras is an excellent framework to learn when you’re starting out in deep learning.
Using CNNs to Classify Hand-written Digits on MNIST Dataset
MNIST (Modified National Institute of Standards and Technology) is a well-known dataset used in Computer Vision that was built by Yann Le Cun et. al. It is composed of images that are handwritten digits (0-9), split into a training set of 50,000 images and a test set of 10,000 where each image is of 28 x 28 pixels in width and height.
This dataset is often used for practicing any algorithm made for image classification as the dataset is fairly easy to conquer. Hence, I recommend that this should be your first dataset if you are just foraying in the field.
MNIST comes with Keras by default and you can simply load the train and test files using a few lines of code:
from keras.datasets import mnist # loading the dataset (X_train, y_train), (X_test, y_test) = mnist.load_data() # let's print the shape of the dataset
print("X_train shape", X_train.shape) print("y_train shape", y_train.shape) print("X_test shape", X_test.shape) print("y_test shape", y_test.shape)
Here is the shape of X (features) and y (target) for the training and validation data:
X_train shape (60000, 28, 28) y_train shape (60000,) X_test shape (10000, 28, 28) y_test shape (10000,)
Before we train a CNN model, let’s build a basic Fully Connected Neural Network for the dataset. The basic steps to build an image classification model using a neural network are:
- Flatten the input image dimensions to 1D (width pixels x height pixels)
- Normalize the image pixel values (divide by 255)
- One-Hot Encode the categorical column
- Build a model architecture (Sequential) with Dense layers
- Train the model and make predictions
Here’s how you can build a neural network model for MNIST. I have commented on the relevant parts of the code for better understanding:
After running the above code, you’d realized that we are getting a good validation accuracy of around 97% easily.
Let’s modify the above code to build a CNN model.
One major advantage of using CNNs over NNs is that you do not need to flatten the input images to 1D as they are capable of working with image data in 2D. This helps in retaining the “spatial” properties of images.
Here’s the full code for the CNN model:
Even though our max validation accuracy by using a simple neural network model was around 97%, the CNN model is able to get 98%+ with just a single convolution layer!
You can go ahead and add more Conv2D layers, and also play around with the hyperparameters of the CNN model.
Identifying Images from the CIFAR-10 Dataset using CNNs
MNIST is a beginner-friendly dataset in computer vision. It’s easy to score 90%+ on validation by using a CNN model. But what if you are beyond beginner and need something challenging to put your concepts to use?
That’s where the CIFAR-10 dataset comes into the picture!
Here’s how the developers behind CIFAR (Canadian Institute For Advanced Research) describe the dataset:
The CIFAR-10 dataset consists of 60,000 32 x 32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images.
The important points that distinguish this dataset from MNIST are:
- Images are colored in CIFAR-10 as compared to the black and white texture of MNIST
- Each image is 32 x 32 pixel
- 50,000 training images and 10,000 testing images
Now, these images are taken in varying lighting conditions and at different angles, and since these are colored images, you will see that there are many variations in the color itself of similar objects (for example, the color of ocean water). If you use the simple CNN architecture that we saw in the MNIST example above, you will get a low validation accuracy of around 60%.
That’s a key reason why I recommend CIFAR-10 as a good dataset to practice your hyperparameter tuning skills for CNNs. The good thing is that just like MNIST, CIFAR-10 is also easily available in Keras.
You can simply load the dataset using the following code:
from keras.datasets import cifar10 # loading the dataset (X_train, y_train), (X_test, y_test) = cifar10.load_data()
Here’s how you can build a decent (around 78-80% on validation) CNN model for CIFAR-10. Notice how the shape values have been updated from (28, 28, 1) to (32, 32, 3) according to the size of the images:
Here’s what I changed in the model:
- Increased the number of Conv2D layers to build a deeper model
- Increased number of filters to learn more features
- Added Dropout for regularization
- Added more Dense layers
Training and validation accuracy across epochs:
You can easily eclipse this performance by tuning the above model. Once you have mastered CIFAR-10, there’s also CIFAR-100 available in Keras that you can use for further practice. Since it has 100 classes, it won’t be an easy task to achieve!
Categorizing the Images of ImageNet using CNNs
Now that you have mastered MNIST and CIFAR-10, let’s take this problem a notch higher. Here, we will take a look at the famous ImageNet dataset.
ImageNet is the main database behind the ImageNet Large Scale Recognition Challenge (ILSVRC). This is like the Olympics of Computer Vision. This is the competition that made CNNs popular the first time and every year, the best research teams across industries and academia compete with their best algorithms on computer vision tasks.
About the ImageNet Dataset
The ImageNet dataset has more than 14 million images, hand-labeled across 20,000 categories.
Also, unlike the MNIST and CIFAR-10 datasets that we have already discussed, the images in ImageNet are of decent resolution (224 x 224) and that’s what poses a challenge for us: 14 million images, each 224 by 224 pixels. Processing a dataset of this size requires a great amount of computing power in terms of CPU, GPU, and RAM.
The downside – that might be too much for an everyday laptop. So what’s the alternative solution? How can an enthusiast work with the ImageNet dataset?
That’s where Fast.ai’s Imagenette dataset comes in
Imagenette is a dataset that’s extracted from the large ImageNet collection of images. The reason behind releasing Imagenette is that researchers and students can practice on ImageNet level images without needing that much compute resources.
In the words of Jeremy Howard himself:
“I (Jeremy Howard, that is) mainly made Imagenette because I wanted a small vision dataset I could use to quickly see if my algorithm ideas might have a chance of working. They normally don’t, but testing them on Imagenet takes a really long time for me to find that out, especially because I’m interested in algorithms that perform particularly well at the end of training.
But I think this can be a useful dataset for others as well.”
And that’s what we will also use for practicing!
1. Download the Imagenette Dataset
Here’s how you can fetch the dataset (commands for your terminal):
$ wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2.tgz $ tar -xf imagenette2.tgz
Once you have downloaded the dataset, you will notice that it has two folders – “train” and “val”. These contain the training and validation set respectively. Inside each folder, there are separate folders for each class. Here’s the mapping of the classes:
These classes have the same ID in the original ImageNet dataset. Each of the classes has approximately 1000 images so overall, it’s a balanced dataset.
2. Loading Images using ImageDataGenerator
Keras has this useful functionality for loading large images (like we have here) without maxing out the RAM, by doing it in small batches. ImageDataGenerator in combination with fit_generator provides this functionality:
The ImageDataGenerator itself inferences the class labels and the number of classes from the folder names.
3. Building a Basic CNN model for Image Classification
Let’s build a basic CNN model for our Imagenette dataset (for the purpose of image classification):
When we compare the validation accuracy of the above model, you’ll realize that even though it is a more deep architecture than what we have utilized so far, we are only able to get a validation accuracy of around 40-50%.
There can be many reasons for this, such as our model is not complex enough to learn the underlying patterns of images, or maybe the training data is too small to accurately generalize across classes.
Step up – transfer learning.
4. Using Transfer Learning (VGG16) to improve accuracy
VGG16 is a CNN architecture that was the first runner-up in the 2014 ImageNet Challenge. It’s designed by the Visual Graphics Group at Oxford and has 16 layers in total, with 13 convolutional layers themselves. We will load the pre-trained weights of this model so that we can utilize the useful features this model has learned for our task.
Downloading weights of VGG16
from keras.applications import VGG16 # include top should be False to remove the softmax layer pretrained_model = VGG16(include_top=False, weights='imagenet') pretrained_model.summary()
Generate features from VGG16
Let’s extract useful features that VGG16 already knows from our dataset’s images:
from keras.utils import to_categorical # extract train and val features vgg_features_train = pretrained_model.predict(train) vgg_features_val = pretrained_model.predict(val)
# OHE target column train_target = to_categorical(train.labels) val_target = to_categorical(val.labels)
Notice how quickly your model starts converging. In just 10 epochs, you have a 94%+ validation accuracy. Isn’t that amazing?
In case you have mastered the Imagenette dataset, fastai has also released two variants which include classes you’ll find difficult to classify:
- Imagewoof: 10 classes of dog breeds, a more difficult problem to classify
- Image网 (“wang”): A combination of Imagenette and Imagewoof and a couple of tricks that make it a harder problem
Where to go from here?
Apart from the datasets we’ve above, you can also use the below datasets for building computer vision algorithms. In fact, consider this a challenge. Can you apply your CNN knowledge to beat the benchmark score on these datasets?
- Fashion MNIST – MNIST-like dataset of clothes and apparel. Instead of digits, the images show a type of apparel (T-shirt, trousers, bag, etc.)
- Caltech 101 – Another challenging dataset that I found for image classification
I also suggest that before going for transfer learning, try improving your base CNN models. You can learn from the architectures of VGG16, ZFNet, etc. for some clues on hyperparameter tuning and you can use the same ImageDataGenerator to augment your images and increase the size of the dataset.