Fast Food Classification Using Transfer Learning With Pytorch

Avijit Biswas 15 Jun, 2023 • 7 min read

Introduction

Fast food classification has become an important task in the automated food delivery system.  Machine learning has become popular with the growth of fast food chains and the need for accurate and efficient food recognition systems. In this blog, we will explore the use of transfer learning for fast food classification using PyTorch. Transfer learning is a technique that leverages pre-trained models to solve new tasks with limited data.

Food Classification using Transfer Learning
                                                                                    Source: Unsplash

We will discuss how to fine-tune a pre-trained model for fast food classification and the results obtained from this approach.

Learning objective

  • Learning about PyTorch for Deep learning
  • How to use transfer learning in PyTorch?
  • Data Augmentation
  • Visualise the Model

This article was published as a part of the Data Science Blogathon.

Table of Contents

What is Transfer Learning?

Transfer learning is a technique that utilizes the pre-trained weights of a deep learning model to perform a new task with limited data. In the context of ResNet18(which I will use in this project), transfer learning would involve taking a pre-trained ResNet18 model and fine-tuning its weights for a specific fast food classification task. This approach aims to leverage the knowledge learned by the pre-trained model on a large dataset to solve the new task with fewer data and computational resources. The fine-tuning process typically involves retraining the final layers of the ResNet18 model to adapt it to the new task. Below is the resnet18 model diagram.

Transfer learning architecture

Source: ResearchGate

You can see the model consists of 17 layers of Convolutional layer with 3 * 3 filter and One layer of the Fully connected layer. Last is the Softmax function for multiple classes of image classification.

Data Description

Data is hosted in Kaggle here.

There are 10 categories of Fast Food images.

  • Burger
  • Donut
  • Hot Dog
  • Pizza
  • Sandwich
  • Baked Potato
  • Crispy Chicken
  • Fries
  • Taco
  • Taquito

Coding Implementation

Step1: Import all the necessary libraries

from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

Step2: Setting PATH to the dataset and device

PATH = "../data/Fast Food Classification V2/"

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# make sure my GPU is detected.
print(device)

Step3: Data Augmentation and Normalization

Data augmentation is a crucial technique used in deep learning to increase the size of the training dataset and prevent overfitting. It can help improve the performance and robustness of deep learning models, especially in scenarios with limited data.

data_transforms = {
    'Train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'Valid': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Step4: Loading Dataset and creating Dataloader object

image_datasets = {
    x: datasets.ImageFolder(os.path.join(PATH, x),
                            data_transforms[x]) for x in ['Train', 'Valid']
}
dataloaders = {
    x: torch.utils.data.DataLoader(image_datasets[x], 
                                   batch_size=32,
                                   shuffle=True, 
                                ) for x in ['Train', 'Valid']
}
dataset_sizes = {x: len(image_datasets[x]) for x in ['Train', 'Valid']}
class_names = image_datasets['Train'].classes
print(classes)

>>>
['Baked Potato',
 'Burger',
 'Crispy Chicken',
 'Donut',
 'Fries',
 'Hot Dog',
 'Pizza',
 'Sandwich',
 'Taco',
 'Taquito']

Let’s see some training data.

# create a function image show

def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001) 
# Get a batch of training data
inputs, classes = next(iter(dataloaders['Train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out)
Plot while implement Transfer learning

Step5: Create a training function

The function takes the following inputs:

  1.  Model: The deep learning model that is to be trained.
  2. Criterion: The loss function used to evaluate the model’s performance.
  3. Optimizer: The optimization algorithm updates the model’s parameters during training.
  4. Scheduler: A learning rate scheduler is used to adjust the learning rate during training.
  5. num_epochs: The number of training epochs (default = 25).

The function trains the model for num_epochs epochs, alternating between the training and validation phases. In each epoch, the model’s parameters are updated using the optimizer and the criterion to calculate the loss. During the training phase, the gradients are computed using backward(), and the parameters are updated using the optimizer.step(). The model’s performance is evaluated in the validation phase without updating the parameters.

After each epoch, the performance metrics (loss and accuracy) are printed. The best model weights (with the highest validation accuracy) are saved using copy.deepcopy(). At the end of the training, the time elapsed and the best validation accuracy are printed, and the best model weights are loaded using the model.load_state_dict(). Finally, the trained model is returned.

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['Train', 'Valid']:
            if phase == 'Train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'Train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'Train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'Train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # deep copy the model
            if phase == 'Valid' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best Valid Acc: {best_acc:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

Step6: Start training the model with Resnet18 weights

model_1 = models.resnet18(pretrained=True)
num_ftrs = model_1.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_1.fc = nn.Linear(num_ftrs, len(class_names))

model_1 = model_1.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_sgd = optim.SGD(model_1.parameters(), lr=0.001, momentum=0.9)
optimizer_adam = optim.Adam(model_1.parameters(), lr=0.001)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_adam, step_size=7, gamma=0.1)
model_resnetft = train_model(model_1, criterion, optimizer_adam, exp_lr_scheduler,
                       num_epochs=15)
 
 output >>>
 Epoch 0/14
----------
Train Loss: 1.3397 Acc: 0.5660
Valid Loss: 1.0503 Acc: 0.6691
     .
     .
     .
     continues
     .
     .
     .
 Epoch 14/14
----------
Train Loss: 0.4054 Acc: 0.8709
Valid Loss: 0.4723 Acc: 0.8600

Training complete in 27m 23s
Best Valid Acc: 0.867714

So, you can see that completing training takes nearly 28min in GPU in Nvidia Tesla P100. And the Best Validation accuracy score is 86.77%.

Step7: Now see some results 

The code first sets the model to evaluation mode (model.eval()) and initializes a counter images_so_far to keep track of the number of images visualized so far. A figure is also created using plt.figure().

The function then iterates over the validation data using enumerate(dataloaders[‘Valid’]). For each iteration, the input images and labels are moved to the specified device (using inputs.to(device) and labels.to(device)), and the model’s predictions are computed using model(inputs). The predicted class for each image is obtained using _, preds = torch.max(outputs, 1).

For each input image, the code plots the image using imshow(inputs.cpu().data[j]) and sets the title to the predicted class. The code keeps track of the number of images visualized so far using the counter images_so_far, and if the number of images visualized equals the specified number, the function returns.

Finally, the code sets the model back to its original training mode using the model.train(mode=was_training).

def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['Valid']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'predicted: {class_names[preds[j]]}')
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)
        
        
 # Visualize model
 visualize_model(model_1)
 image.png by Author
 image.png by Author
 image.png by Author
 image.png by Author
 image.png

Conclusion

This article has demonstrated the use of transfer learning to perform fast food classification using the ResNet18 architecture and PyTorch. The implementation showed how to fine-tune the pre-trained model on the food dataset and evaluate the model’s performance on the validation set. The results showed that transfer learning could effectively leverage the knowledge learned from the large-scale dataset to improve the performance of the food classification task. Overall, transfer learning is a powerful tool for solving computer vision problems and has the potential to revolutionize the field. Following are some key learning from this project:

  1. ResNet18 is a popular deep learning architecture used in computer vision tasks, and it can be used as a feature extractor in transfer learning.
  2. Transfer learning is a technique in deep learning where a pre-trained model is fine-tuned for a specific task.
  3. The code implementation showed how to fine-tune the pre-trained ResNet18 model on the food dataset and evaluate the model’s performance on the validation set.
  4. Data augmentation techniques increased the training dataset’s size and improved the model’s performance.
  5. The results showed that transfer learning using ResNet18 and PyTorch could effectively classify fast food images and achieve high accuracy.
  6. Transfer learning is a powerful tool for solving computer vision problems and has the potential to revolutionize the field.

I hope this article will help you in your learning quest. If you have any questions, comment below. The entire codes are in my Kaggle notebook.

Connect with me on Twitter and LinkedIn.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 

Avijit Biswas 15 Jun 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Related Courses