A Practical Implementation of the Faster R-CNN Algorithm for Object Detection (Part 2 – with Python codes)

Pulkit Sharma Last Updated : 01 Jul, 2024

10 min read

Introduction

Which algorithm do you use for object detection tasks? I have tried quite a few in my quest to build the most precise model in the least amount of time. This journey, spanning multiple hackathons and real-world datasets, has always led me to the R-CNN family of algorithms.

It has been an incredibly useful framework for me, and that’s why I decided to pen down my learnings in the form of a series of articles. The aim of this series is to showcase how useful the different types of R-CNN algorithms are. The first part received an overwhelmingly positive response from our community, and I’m thrilled to present part two!

In this article, we will first briefly summarize what we learned in part 1, and then deep dive into the implementation of the fastest member of the R-CNN family – Faster R-CNN. I highly recommend going through this article if you need to refresh your object detection concepts first: A Step-by-Step Introduction to the Basic Object Detection Algorithms (Part 1).

Part 3 of this series is published now and you can check it out here: A Practical Guide to Object Detection using the Popular YOLO Framework – Part III (with Python codes)

We will work on a very interesting dataset here, so let’s dive right in!

Introduction
A Brief Overview of the Different R-CNN Algorithms for Object Detection
CNN vs R-CNN vs Fast R-CNN vs Faster R-CNN
Understanding the Problem Statement
Setting up the System
Data Exploration
Implementing Faster R-CNN
Conclusion
FAQs

A Brief Overview of the Different R-CNN Algorithms for Object Detection

Let’s quickly summarize the different algorithms in the R-CNN family (R-CNN, Fast R-CNN, and Faster R-CNN) that we saw in the first article. This will help lay the ground for our implementation part later when we will predict the bounding boxes present in previously unseen images (new data).

R-CNN extracts a bunch of regions from the given image using selective search, and then checks if any of these boxes contains an object. We first extract these regions, and for each region, CNN is used to extract specific features. Finally, these features are then used to detect objects. Unfortunately, R-CNN becomes rather slow due to these multiple steps involved in the process.

Fast R-CNN, on the other hand, passes the entire image to ConvNet which generates regions of interest (instead of passing the extracted regions from the image). Also, instead of using three different models (as we saw in R-CNN), it uses a single model which extracts features from the regions, classifies them into different classes, and returns the bounding boxes.

All these steps are done simultaneously, thus making it execute faster as compared to R-CNN. Fast R-CNN is, however, not fast enough when applied on a large dataset as it also uses selective search for extracting the regions.

Faster R-CNN fixes the problem of selective search by replacing it with Region Proposal Network (RPN). We first extract feature maps from the input image using ConvNet and then pass those maps through a RPN which returns object proposals. Finally, these maps are classified and the bounding boxes are predicted.

Faster rcnn fices problem by rpn — Faster R-CNN

I have summarized below the steps followed by a Faster R-CNN algorithm to detect objects in an image:

Take an input image and pass it to the ConvNet which returns feature maps for the image
Apply Region Proposal Network (RPN) on these feature maps and get object proposals
Apply ROI pooling layer to bring down all the proposals to the same size
Finally, pass these proposals to a fully connected layer in order to classify any predict the bounding boxes for the image

CNN vs R-CNN vs Fast R-CNN vs Faster R-CNN

What better way to compare these different algorithms than in a tabular format? So here you go!

Algorithm	Features	Prediction time / image	Limitations
CNN	Divides the image into multiple regions and then classifies each region into various classes.	–	Needs a lot of regions to predict accurately and hence high computation time.
R-CNN	Uses selective search to generate regions. Extracts around 2000 regions from each image.	40-50 seconds	High computation time as each region is passed to the CNN separately. Also, it uses three different models for making predictions.
Fast R-CNN	Each image is passed only once to the CNN and feature maps are extracted. Selective search is used on these maps to generate predictions. Combines all the three models used in R-CNN together.	2 seconds	Selective search is slow and hence computation time is still high.
Faster R-CNN	Replaces the selective search method with region proposal network (RPN) which makes the algorithm much faster.	0.2 seconds	Object proposal takes time and as there are different systems working one after the other, the performance of systems depends on how the previous system has performed.

Now that we have a grasp on this topic, it’s time to jump from the theory into the practical part of our article. Let’s implement Faster R-CNN using a really cool (and rather useful) dataset with potential real-life applications!

Understanding the Problem Statement

We will be working on a healthcare related dataset and the aim here is to solve a Blood Cell Detection problem. Our task is to detect all the Red Blood Cells (RBCs), White Blood Cells (WBCs), and Platelets in each image taken via microscopic image readings. Below is a sample of what our final predictions should look like:

The reason for choosing this dataset is that the density of RBCs, WBCs and Platelets in our blood stream provides a lot of information about the immune system and hemoglobin. This can help us potentially identify whether a person is healthy or not, and if any discrepancy is found in their blood, actions can be taken quickly to diagnose that.

Manually looking at the sample via a microscope is a tedious process. And this is where Deep Learning models play such a vital role. They can classify and detect the blood cells from microscopic images with impressive precision.

The full blood cell detection dataset for our challenge can be downloaded from here. I have modified the data a tiny bit for the scope of this article:

The bounding boxes have been converted from the given .xml format to a .csv format
I have also created the training and test set split on the entire dataset by randomly picking images for the split

Note that we will be using the popular Keras framework with a TensorFlow backend in Python to train and build our model.

Setting up the System

Before we actually get into the model building phase, we need to ensure that the right libraries and frameworks have been installed. The below libraries are required to run this project:

pandas
matplotlib
tensorflow
keras – 2.0.3
numpy
opencv-python
sklearn
h5py

Most of the above mentioned libraries will already be present on your machine if you have Anaconda and Jupyter Notebooks installed. Additionally, I recommend downloading the requirement.txt file from this link and use that to install the remaining libraries. Type the following command in the terminal to do this:

pip install -r requirement.txt

Alright, our system is now set and we can move on to working with the data!

Data Exploration

It’s always a good idea (and frankly, a mandatory step) to first explore the data we have. This helps us not only unearth hidden patterns, but gain a valuable overall insight into what we are working with. The three files I have created out of the entire dataset are:

train_images: Images that we will be using to train the model. We have the classes and the actual bounding boxes for each class in this folder.
test_images: Images in this folder will be used to make predictions using the trained model. This set is missing the classes and the bounding boxes for these classes.
train.csv: Contains the name, class and bounding box coordinates for each image. There can be multiple rows for one image as a single image can have more than one object.

Let’s read the .csv file (you can create your own .csv file from the original dataset if you feel like experimenting) and print out the first few rows. We’ll need to first import the below libraries for this:

Code

# importing required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import patches

# read the csv file using read_csv function of pandas
train = pd.read_csv(‘train.csv’)
train.head()

There are 6 columns in the train file. Let’s understand what each column represents:

image_names: contains the name of the image
cell_type: denotes the type of the cell
xmin: x-coordinate of the bottom left part of the image
xmax: x-coordinate of the top right part of the image
ymin: y-coordinate of the bottom left part of the image
ymax: y-coordinate of the top right part of the image

Let’s now print an image to visualize what we’re working with:

# reading single image using imread function of matplotlib
image = plt.imread('images/1.jpg')
plt.imshow(image)

This is what a blood cell image looks like. Here, the blue part represents the WBCs, and the slightly red parts represent the RBCs. Let’s look at how many images, and the different type of classes, there are in our training set.

# Number of unique training images
train['image_names'].nunique()

So, we have 254 training images.

# Number of classes
train['cell_type'].value_counts()

We have three different classes of cells, i.e., RBC, WBC and Platelets. Finally, let’s look at how an image with detected objects will look like:

fig = plt.figure()

#add axes to the image
ax = fig.add_axes([0,0,1,1])

# read and plot the image
image = plt.imread('images/1.jpg')
plt.imshow(image)

# iterating over the image for different objects
for _,row in train[train.image_names == "1.jpg"].iterrows():
    xmin = row.xmin
    xmax = row.xmax
    ymin = row.ymin
    ymax = row.ymax
    
    width = xmax - xmin
    height = ymax - ymin
    
    # assign different color to different classes of objects
    if row.cell_type == 'RBC':
        edgecolor = 'r'
        ax.annotate('RBC', xy=(xmax-40,ymin+20))
    elif row.cell_type == 'WBC':
        edgecolor = 'b'
        ax.annotate('WBC', xy=(xmax-40,ymin+20))
    elif row.cell_type == 'Platelets':
        edgecolor = 'g'
        ax.annotate('Platelets', xy=(xmax-40,ymin+20))
        
    # add bounding boxes to the image
    rect = patches.Rectangle((xmin,ymin), width, height, edgecolor = edgecolor, facecolor = 'none')
    
    ax.add_patch(rect)

This is what a training example looks like. We have the different classes and their corresponding bounding boxes. Let’s now train our model on these images. We will be using the keras_frcnn library to train our model as well as to get predictions on the test images.

Implementing Faster R-CNN

For implementing the Faster R-CNN algorithm, we will be following the steps mentioned in this Github repository. So as the first step, make sure you clone this repository. Open a new terminal window and type the following to do this:

git clone https://github.com/kbardool/keras-frcnn.git

Move the train_images and test_images folder, as well as the train.csv file, to the cloned repository. In order to train the model on a new dataset, the format of the input should be:

filepath,x1,y1,x2,y2,class_name

where,

filepath is the path of the training image
x1 is the xmin coordinate for bounding box
y1 is the ymin coordinate for bounding box
x2 is the xmax coordinate for bounding box
y2 is the ymax coordinate for bounding box
class_name is the name of the class in that bounding box

We need to convert the .csv format into a .txt file which will have the same format as described above. Make a new dataframe, fill all the values as per the format into that dataframe, and then save it as a .txt file.

Code

data = pd.DataFrame()
data['format'] = train['image_names']

# as the images are in train_images folder, add train_images before the image name
for i in range(data.shape[0]):
    data['format'][i] = 'train_images/' + data['format'][i]

# add xmin, ymin, xmax, ymax and class as per the format required
for i in range(data.shape[0]):
    data['format'][i] = data['format'][i] + ',' + str(train['xmin'][i]) + ',' + str(train['ymin'][i]) + ',' + str(train['xmax'][i]) + ',' + str(train['ymax'][i]) + ',' + train['cell_type'][i]

data.to_csv('annotate.txt', header=None, index=None, sep=' ')

What’s next?

Training Model

Train our model! We will be using the train_frcnn.py file to train the model.

cd keras-frcnn
python train_frcnn.py -o simple -p annotate.txt

It will take a while to train the model due to the size of the data. If possible, you can use a GPU to make the training phase faster. You can also try to reduce the number of epochs as an alternate option. To change the number of epochs, go to the train_frcnn.py file in the cloned repository and change the num_epochs parameter accordingly.

Every time the model sees an improvement, the weights of that particular epoch will be saved in the same directory as “model_frcnn.hdf5”. These weights will be used when we make predictions on the test set.

It might take a lot of time to train the model and get the weights, depending on the configuration of your machine. I suggest using the weights I’ve got after training the model for around 500 epochs. You can download these weights from here. Ensure you save these weights in the cloned repository.

So our model has been trained and the weights are set. It’s prediction time! Keras_frcnn makes the predictions for the new images and saves them in a new folder. We just have to make two changes in the test_frcnn.py file to save the images:

Remove the comment from the last line of this file:
cv2.imwrite(‘./results_imgs/{}.png’.format(idx),img)
Add comments on the second last and third last line of this file:
# cv2.imshow(‘img’, img)
# cv2.waitKey(0)

Let’s make the predictions for the new images:

python test_frcnn.py -p test_images

Finally, the images with the detected objects will be saved in the “results_imgs” folder. Below are a few examples of the predictions I got after implementing Faster R-CNN:

Implementing result of faster rcnn — *Result 3*

Conclusion

R-CNN algorithms have truly been a game-changer for object detection tasks. There has suddenly been a spike in recent years in the amount of computer vision applications being created, and R-CNN is at the heart of most of them.

Keras_frcnn proved to be an excellent library for object detection, and in the next article of this series, we will focus on more advanced techniques like YOLO, SSD, etc.

If you have any query or suggestions regarding what we covered here, feel free to post them in the comments section below and I will be happy to connect with you!

FAQs

Q1. What is the use of faster R-CNN?

Faster R-CNN is a deep learning model that detects objects in images. It is used in self-driving cars, security systems, medical imaging, and robotics.

Faster R-CNN works by first identifying regions of interest (ROIs) in an image. The ROIs are then passed to a second network, which classifies the objects in each ROI and predicts their bounding boxes.

Q2. What is fast R-CNN vs faster R-CNN?

Faster R-CNN is an improved version of Fast R-CNN for object detection. It is faster because it uses a region proposal network (RPN) to generate ROIs directly from the feature maps of the CNN.

Pulkit Sharma

My research interests lies in the field of Machine Learning and Deep Learning. Possess an enthusiasm for learning new skills and technologies.

Free Courses

4.8

Ensemble Learning and Ensemble Learning Techniques

Learn ensemble learning, its techniques, and how it works in this course!

4.8

Nano Course: Dreambooth-Stable Diffusion for Custom Images

Learn to create custom images with Dreambooth Stable Diffusion technology

4.9

Dimensionality Reduction for Machine Learning

Master key dimensionality reduction techniques for ML success!

sset

Thanks for article. Most of object detection algorithms fail if size of object to be detected is very small and with varying size. For example detection of small cracks on metal surface. What is your view on this? Any advice and best possible approach?

Show 1 reply

Hi, I believe Faster RCNN works good enough for small objects as well. Currently, I am working on YOLO and SSD and will share my learnings on how they deal with small objects.

Vishnu

Hey Pulkit, I am a Freshman at UIUC studying CS and one of my projects is in the same domain. Would it be possible to connect with you and talk more about this?

Hi Vishnu, You can ask any query related to this project here. I will try my best to respond to them.

Toni

The dataset that you have provided, it doesn't correspond with your example. Can you provide the right source?

Hi Toni, The original dataset is available on GitHub, link for the same is provided in the article. I have made some changes in that dataset and have mentioned the changes as well. You just have to convert the dataset in csv format.

Reading list

A Practical Implementation of the Faster R-CNN Algorithm for Object Detection (Part 2 – with Python codes)

Introduction

Table of contents

A Brief Overview of the Different R-CNN Algorithms for Object Detection

CNN vs R-CNN vs Fast R-CNN vs Faster R-CNN

Understanding the Problem Statement

Setting up the System

Data Exploration

Code

Implementing Faster R-CNN

Code

Training Model

Conclusion

FAQs

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Nano Course: Dreambooth-Stable Diffusion for Custom Images

Dimensionality Reduction for Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

A Practical Implementation of the Faster R-CNN Algorithm for Object Detection (Part 2 – with Python codes)

Introduction

Table of contents

A Brief Overview of the Different R-CNN Algorithms for Object Detection

CNN vs R-CNN vs Fast R-CNN vs Faster R-CNN

Understanding the Problem Statement

Setting up the System

Data Exploration

Code

Implementing Faster R-CNN

Code

Training Model

Conclusion

FAQs

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Nano Course: Dreambooth-Stable Diffusion for Custom Images

Dimensionality Reduction for Machine Learning

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques