- The art of transfer learning could transform the way you build machine learning and deep learning models
- Learn how transfer learning works using PyTorch and how it ties into using pre-trained models
- We’ll work on a real-world dataset and compare the performance of a model built using convolutional neural networks (CNNs) versus one built using transfer learning
I was working on a computer vision project last year where we had to build a robust face detection model. The concept behind that is fairly straightforward – it’s the execution part that always sticks in my mind.
Given the size of the dataset we had, building a model from scratch was a real challenge. It was going to be potentially time-consuming and a strain on the computational resources we had. We had to figure out a solution quickly because we were working with a tight deadline.
This is when the powerful concept of transfer learning came to our rescue. It is a really helpful tool to have in your data scientist armoury, especially when you’re working with limited time and computational power.
So in this article, we will learn all about transfer learning and how to leverage it on a real-world project using Python. We’ll also discuss the role of pre-trained models in this space and how they’ll change the way you build machine learning pipelines.
This article is part of my PyTorch series for beginners. I strongly believe PyTorch is one of the best deep learning frameworks right now and will only go from strength to strength in the near future. This is a great time to learn how it works and get onboard. Make sure you check out the previous articles in this series:
- A Beginner-Friendly Guide to PyTorch and How it Works from Scratch
- Build an Image Classification Model using Convolutions Neural Networks (CNNs) in PyTorch
If you are completely new to CNNs, you can learn them comprehensively by enrolling in this free course: Convolutional Neural Networks (CNN) from Scratch
Table of Contents
- Introduction to Transfer Learning
- What are Pre-trained Models? And how to Pick the Right Pre-trained Model?
- Case Study: Emergency vs Non-Emergency Vehicle Classification
- Solving the Challenge using Convolutional Neural Networks (CNNs)
- Solving the Challenge using Transfer Learning and PyTorch
- Performance Comparison of CNN and Transfer Learning
Introduction to Transfer Learning
Let me illustrate the concept of transfer learning using an example. Picture this – you want to learn a topic from a domain you’re completely new to. Pick any domain and any topic – you can think of deep learning and neural networks as well.
What are the different approaches you would take to understand the topic? Off the top of my head:
- Search online for resources
- Read articles and blogs
- Refer to books
- Look out for video tutorials, and so on
All of these will help you get comfortable with the topic. In this situation, you are the only person who is putting in all the effort.
But there’s another approach, which might yield better results in a short amount of time.
You can consult a domain/topic expert who has a solid grasp on the topic you want to learn. This person will transfer his/her knowledge to you. thus expediting your learning process.
The first approach, where you are putting in all the effort alone, is an example of learning from scratch. The second approach is referred to as transfer learning. There is a knowledge transfer happening from an expert in that domain to a person who is new to it.
Yes, the idea behind transfer learning is that straightforward!
Neural Networks and Convolutional Neural Networks (CNNs) are examples of learning from scratch. Both these networks extract features from a given set of images (in case of an image related task) and then classify the images into their respective classes based on these extracted features.
This is where transfer learning and pre-trained models are so useful. Let’s understand a bit about the latter concept in the next section.
What are Pre-trained Models and how to Pick the Right Pre-trained Model?
Pre-trained models are super useful in any deep learning project that you’ll work on. Not all of us have the unlimited computational power of the top tech behemoths. We need to make do with our local machines so pre-trained models are a blessing there.
A pre-trained model, as you might have surmised already, is a model already designed and trained by a certain person or team to solve a specific problem.
Recall that we learn the weights and biases while training models like Neural Network and CNNs. These weights and biases, when multiplied with the image pixels, help to generate features.
Pre-trained models share their learning by passing their weights and biases matrix to a new model. So, whenever we do transfer learning, we will first select the right pre-trained model and then pass its weight and bias matrix to the new model.
There are n number of pre-trained models available out there. We need to decide which will be the best-suited model for our problem. For now, let’s consider that we have three pre-trained networks available – BERT, ULMFiT, and VGG16.
Our task is to classify the images (as we have been doing in the previous articles of this series). So, which of these pre-trained models will you pick? Let me first give you a quick overview of these pre-trained networks which will help us to decide the right pre-trained model.
BERT and ULMFiT are used for language modeling and VGG16 is used for image classification tasks. And if you look at the problem at hand, it is an image classification one. So it stands to reason that we will pick VGG16.
ImageNet vs. MNIST
Now, to decide the right pre-trained model for our problem, we should explore these ImageNet and MNIST datasets. The ImageNet dataset consists of 1000 classes and a total of 1.2 million images. Some of the classes in this data are animals, cars, shops, dogs, food, instruments, etc.:
MNIST, on the other hand, is trained on handwritten digits. It includes 10 classes from 0 to 9:
We will be working on a project where we need to classify images into emergency and non-emergency vehicles (we will discuss this in more detail in the next section). This dataset includes images of vehicles so a VGG16 model trained on the ImageNet dataset would be more useful for us as it has images of vehicles.
This, in a nutshell, is how we should decide the right pre-trained model based on our problem.
Case Study: Emergency vs Non-Emergency Vehicle Classification
Ideally, we would be using the Identify the Apparels problem for this article. We’ve worked on it in the previous two articles of this series and that would help in comparing our progress.
Unfortunately, this isn’t possible here because VGG16 requires that the images should be of the shape (224,224,3) (the images in the other problem are of shape (28,28)). One way to combat this could have been to resize these (28,28) images to (224,224,3) but this will not make sense intuitively.
Here’s the good part – we’ll be working on a brand new project! Here, our aim is to classify the vehicles as emergency or non-emergency.
This project is also a part of the Computer Vision using Deep Learning course by Analytics Vidhya. To work on more such interesting projects and learn the concepts of computer vision in much more detail, feel free to check out the course.
Let’s now start with understanding the problem and visualizing a few examples. You can download the images using this link. First, import the required libraries:
Next, we will read the .csv file containing the image name and the corresponding label:
There are two columns in the .csv file:
- image_names: It represents the name of all the images in the dataset
- emergency_or_no: It specifies whether that particular image belongs to the emergency or non-emergency class. 0 means that the image is a non-emergency vehicle and 1 represents an emergency vehicle
Next, we will load all the images and store them in an array format:
It took approximately 12 seconds to load these images. There are 1,646 images in our dataset and we have reshaped all of them to (224,224,3) since VGG16 requires all the images in this particular shape. Let’s now visualize a few images from the dataset:
This is a police car and hence has a label of Emergency vehicle. Now we will store the target in a separate variable:
Let’s create a validation set to evaluate our model:
We have 1,481 images in the training set and remaining 165 images in the validation set. We now have to convert the dataset into torch format:
Similarly, we will convert the validation set:
Our data is ready! In the next section, we will build a Convolutional Neural Network (CNN) before we use the pre-trained model to solve this problem.
Solving the Challenge using Convolutional Neural Networks (CNNs)
We are finally at the model building part! Before using transfer learning to solve the problem, let’s use a CNN model and set a benchmark for ourselves.
We will build a very simple CNN architecture with two convolutional layers to extract features from images and a dense layer at the end to classify these features:
Let’s now define the optimizer, learning rate and the loss function for our model and use a GPU to train the model:
This is how the architecture of the model looks like. Finally, we will train the model for 15 epochs. I am setting the batch_size of the model to 128 (you can play around with this):
This will print a summary of the training as well. The training loss is decreasing after each epoch and that’s a good sign. Let’s check the training as well as the validation accuracy:
We got a training accuracy of around 82% which is a good score. Let’s now check the validation accuracy:
The validation accuracy comes out to be 76%. Now that we have a benchmark with us, it’s time to use transfer learning to solve this emergency versus non-emergency vehicle classification problem. Let’s get rolling!
Solving the Challenge using Transfer Learning
I’ve touched on this above and I’ll reiterate it here – we will be using the VGG16 pre-trained model trained on the ImageNet dataset. Let’s look at the steps we will be following to train the model using transfer learning:
- First, we will load the weights of the pre-trained model – VGG16 in our case
- Then we will fine tune the model as per the problem at hand
- Next, we will use these pre-trained weights and extract features for our images
- Finally, we will train the fine tuned model using the extracted features
So, let’s start by loading the weights of the model:
We will now fine tune the model. We will not be training the layers of the VGG16 model and hence let’s freeze the weights of these layers:
Since we only have 2 classes to predict and VGG16 is trained on ImageNet which has 1000 classes, we need to update the final layer as per our problem:
Since we will be training only the last layer, I have set the requires_grad as True for the last layer. Let’s set the training to GPU:
We’ll now use the model and extract features for both the training and validation images. I will set the batch_size as 128 (again, you can increase or decrease this batch_size per your requirement):
Similarly, let’s extract features for our validation images:
Next, we will convert these data into torch format:
We also have to define the optimizer and the loss function for our model:
It’s time to train the model. We will train it for 30 epochs with a batch_size set to 128:
Here is a summary of the model. You can see that the loss has decreased and hence we can say that the model is improving. Let’s validate this by looking at the training and validation accuracies:
We got an accuracy of ~ 84% on the training set. Let’s now check the validation accuracy:
The validation accuracy of the model is also similar, i,e, 83%. The training and validation accuracies are almost in sync and hence we can say that the model is generalized. Here is the summary of our results:
|Model||Training Accuracy||Validation Accuracy|
We can infer that the accuracies have improved by using the VGG16 pre-trained model as compared to the CNN model. Got to love the art of transfer learning!
In this article, we learned how to use pre-trained models and transfer learning to solve an image classification problem. We first understood what pre-trained models are and how to choose the right pre-trained model depending on the problem at hand. Then we took a case study of classifying images of vehicles as emergency or non-emergency. We solved this case study using a CNN model first and then we used the VGG16 pre-trained model to solve the same problem.
We found that using the VGG16 pre-trained model significantly improved the model performance and we got better results as compared to the CNN model. I hope you now have a clear understanding of how to use transfer learning and the right pre-trained model to solve problems using PyTorch.
I encourage you to take other image classification problems and try to apply transfer learning to solve them. This will help you to grasp the concept much more clearly.
As always, if you have some feedback or doubts related to this tutorial, feel free to post them in the comments section below and I will be happy to answer them.