JalFaizy Shaikh — Updated On August 26th, 2021
Advanced Computer Vision Deep Learning Image Object Detection Project Python Supervised Technique Unstructured Data


When we’re shown an image, our brain instantly recognizes the objects contained in it. On the other hand, it takes a lot of time and training data for a machine to identify these objects. But with the recent advances in hardware and deep learning, this computer vision field has become a whole lot easier and more intuitive.

Check out the below image as an example. The system is able to identify different objects in the image with incredible accuracy.

Object detection technology has seen a rapid adoption rate in various and diverse industries. It helps self-driving cars safely navigate through traffic, spots violent behavior in a crowded place,  assists sports teams analyze and build scouting reports, ensures proper quality control of parts in manufacturing, among many, many other things. And these are just scratching the surface of what object detection technology can do!

In this article, we will understand what object detection is and look at a few different approaches one can take to solve problems in this space. Then we will deep dive into building our own object detection system in Python. By the end of the article, you will have enough knowledge to take on different object detection challenges on your own!

Note: This tutorial assumes that you know the basics of deep learning and have solved simple image processing problems before. In case you haven’t, or need a refresher, I recommend reading the following articles first:


Table of Contents

  • What is Object Detection?
  • The Different Approaches we can use to Solve an Object Detection Problem
    • Approach 1: Naive way (Divide and Conquer)
    • Approach 2: Increase the number of divisions
    • Approach 3: Performing structured divisions
    • Approach 4: Becoming more efficient
    • Approach 5: Using Deep Learning for feature selection and to build an end-to-end approach
  • Getting Technical: How to build an Object Detection model using the ImageAI library


What is Object Detection?

Before we dive into build a state-of-the-art model, let us first try to understand what object detection is. Let’s (hypothetically) build a pedestrian detection system for a self-driving car. Suppose your car captures an image like the one below. How would you describe this image?

The image essentially depicts that our car is near a square, and a handful of people are crossing the road in front of our car. As the traffic sign is not clearly visible, the car’s pedestrian detection system should identify exactly where the people are walking so that we can steer clear of them.

So what can the car’s system do to ensure this happens? What it can do is create a bounding box around these people, so that the system can pinpoint where in the image the people are, and then accordingly make a decision as to which path to take, in order to avoid any mishaps.

Our objective behind doing object detection is two folds:

  1. To identify what all objects are present in the image and where they’re located
  2. Filter out the object of attention


Different Approaches to Solve an Object Detection Problem

Now that we know what our problem statement is, what can be a possible approach (or multiple approaches) to solve it? In this section, we’ll look at a few techniques that can be used to detect objects in images. We will start from the simplest approach and find our way up from there. If you have any suggestions or alternate approaches to the ones we will see below, do let me know in the comments section!


Approach 1: Naive way (Divide and Conquer)

The simplest approach we can take is to divide the image into four parts:

  • Upper left hand side corner
  • Upper right hand side corner
  • Lower left hand side corner
  • Lower right hand side corner

Now the next step is to feed each of these parts into an image classifier. This will give us an output of whether that part of the image has a pedestrian or not. If yes, mark that patch in the original image. The output will be somewhat like this:

This is a good approach to try out first, but we are looking for a much more accurate and precise system. It needs to identify the entire object (or a person in this case) because only locating parts of an object could lead to catastrophic results.


Approach 2: Increase the number of divisions

The previous system worked well but what else can we do? We can improve upon it by exponentially increasing the number of patches we input into the system. This is how our output should look like:

This ended up being a boon and a curse. Of course our solution seems a bit better than the naive approach, but it is riddled with so many bounding boxes which approximate the same thing. This is an issue, and we need a more structured way to solve our problem.


Approach 3: Performing structured divisions

In order to build our object detection system in a more structured way, we can follow the below steps:

Step 1: Divide the image into a 10×10 grid like this:

Step 2: Define the centroids for each patch

Step 3: For each centroid, take three different patches of different heights and aspect ratio:

Step 4: Pass all of the patches created through the image classifier to get predictions

So how does the final output look like? A bit more structured and disciplined for sure – take a look below:

But we can further improve on this! Read on to see yet another approach that will produce even better results.


Approach 4: Becoming more efficient

The previous approach we saw is acceptable to quite a good degree, but we can build a system a little more efficient than that. Can you suggest how? Off the top of my mind, I can propose an optimization. If we think about approach #3, we can do two things to make our model better.

  1. Increase the grid size: So instead of taking the grid size as 10, we can increase it to, say, 20:

  2. Instead of three patches, take more patches with various heights and aspect ratios: Here, we can take 9 shapes off of a single anchor, namely three square patches of different heights and 6 vertical and horizontal rectangle patches of different heights. This will provide us with different aspect ratios of the patches.

This again, has its pros and cons. Sure both of the methods will help us go to a more granular level. But it will again create an explosion of all the patches that we have to pass through our image classification model.

What we can do is, take selective patches instead of taking all of them. For example, we could build an intermediate classifier which tries to predict if the patch actually has background, or potentially contains an object. This would exponentially decrease the patches that our image classification model has to see.

One more optimization that we can do, is to decrease the predictions which say the “same thing”. Let’s take the output of approach 3 again:

As you can see, both the bounding box predictions are basically of the same person. We have an option to choose any one of them. So to make predictions, we consider all the boxes which “say the same thing” and then pick whichever one has the most probability of detecting a person.

All of these optimizations have so far given us pretty decent predictions. We almost have all the cards in our hands, but can you guess what is missing? Deep Learning of course!


Approach 5: Using Deep Learning for feature selection and to build an end-to-end approach

Deep learning has so much potential in the object detection space. Can you recommend where and how can we leverage it for our problem? I have listed a couple of methodologies below:

  • Instead of taking patches from the original image, we can pass the original image through a neural network to reduce the dimensions
  • We could also use a neural network to suggest selective patches
  • We can reinforce a deep learning algorithm to give predictions as close to the original bounding box as possible. This will ensure that the algorithm gives more tighter and finer bounding box predictions

Now instead of training different neural networks for solving each individual problem, we can take a single deep neural network model which will attempt to solve all the problems by itself. The advantage of doing this, is that each of the smaller components of a neural network will help in optimizing the other parts of the same neural network. This will help us in jointly training the entire deep model.

Our output would give us the best performance out of all the approaches we have seen so far, somewhat similar to the image below. We will see how to create this using Python in the next section.



Getting Technical: How to build an Object Detection model using the ImageAI library

Now that we know what object detection is and the best approach to solve the problem, let’s build our own object detection system! We will be using ImageAI, a python library which supports state-of-the-art machine learning algorithms for computer vision tasks.

Running an object detection model to get predictions is fairly simple. We don’t have to go through complex installation scripts to get started. We don’t even need a GPU to generate predictions! We will use this ImageAI library to get the output prediction we saw above in approach #5. I highly recommend following along with the code below (on your own machine) as this will enable you to gain the maximum knowledge out of this section.

Please note that you need to set up your system before creating the object detection model. Once you have Anaconda installed in your local system, you can get started with the below steps.

Step 1: Create an Anaconda environment with python version 3.6.

conda create -n retinanet python=3.6 anaconda

Step 2: Activate the environment and install the necessary packages.

source activate retinanet
conda install tensorflow numpy scipy opencv pillow matplotlib h5py keras

Step 3: Then install the ImageAI library.

pip install https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.1/imageai-2.0.1-py3-none-any.whl

Step 4: Now download the pretrained model required to generate predictions. This model is based on RetinaNet (a subject of a future article). Click on the link to download – RetinaNet Pretrained model 

Step 5: Copy the downloaded file to your current working folder

Step 6: Download the image from this link. Name the image as image.png

Step 7: Open jupyter notebook (type jupyter notebook in your terminal) and run the following codes:

from imageai.Detection import ObjectDetection
import os

execution_path = os.getcwd()

detector = ObjectDetection()
detector.setModelPath( os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))
custom_objects = detector.CustomObjects(person=True, car=False)
detections = detector.detectCustomObjectsFromImage(input_image=os.path.join(execution_path , "image.png"), output_image_path=os.path.join(execution_path , "image_new.png"), custom_objects=custom_objects, minimum_percentage_probability=65)

for eachObject in detections:
   print(eachObject["name"] + " : " + eachObject["percentage_probability"] )

This will create a modified image file named image_new.png, which contains the bounding box for your image.

Step 8: To print the image use the following code:

from IPython.display import Image



Congratulations! You have created your own object detection model for pedestrian detection. How awesome is that?


End Notes

In this article, we learned what is object detection, and the intuition behind creating an object detection model. We also saw how to build this object detection model for pedestrian detection using the ImageAI library.

By just tweaking the code a bit, you can easily transform the model to solve your own object detection challenges. If you do solve such a problem using the approach above, especially for a social cause, do let me know in the comments below!


About the Author

JalFaizy Shaikh
JalFaizy Shaikh

Faizan is a Data Science enthusiast and a Deep learning rookie. A recent Comp. Sc. undergrad, he aims to utilize his skills to push the boundaries of AI research.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

29 thoughts on "Understanding and Building an Object Detection Model from Scratch in Python"

Vidyush says: June 28, 2018 at 2:47 pm
Really nice article wanted this and its is simple.. Keep doing the great work Reply
Vaibhav says: June 28, 2018 at 5:20 pm
The second and the third link before the table of contents are pointing to the same page. Reply
Pulkit Sharma
Pulkit Sharma says: June 28, 2018 at 6:47 pm
Hi Vaibhav, Thanks for bringing this to our notice. The links have been updated. Reply
ABHIHEK MISHRA says: June 28, 2018 at 7:36 pm
you didnt tell about other packages using in that code ,many errors are coming for it Reply
Pulkit Sharma
Pulkit Sharma says: June 28, 2018 at 7:56 pm
Hi Abhihek, Can you please tell us what error are you getting? That would help us to clarify your doubt in a better way. Reply
Rajat says: June 29, 2018 at 8:42 am
Hi Pulkit, I am implementing the above code using jupyter notebook . I have gone through all the steps mentioned above but when i executed the above code,i got an error saying "no module named imageai" Reply
Pulkit Sharma
Pulkit Sharma says: June 29, 2018 at 11:09 am
Hi Rajat, The code given in the article is to run in the script. If you want to do any modification to it, like if you want to use it in jupyter notebook, you first have to install jupyter notebook in the same environment. So, once all the installations are done including jupyter notebook in same environment, run the code. It will work. Reply
Suryam says: July 02, 2018 at 12:54 pm
Hi , As above mentioned i have done with every when i executing getting " No Module Named imageai" Kindly give me the solutions Reply
Gianni says: July 02, 2018 at 6:17 pm
Try this in a cell of your jupyter notebook: !pip install https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.1/imageai-2.0.1-py3-none-any.whl For the model download, in another cell: import urllib.request url = "https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/resnet50_coco_best_v2.0.1.h5" file_name = "resnet50_coco_best_v2.0.1.h5" urllib.request.urlretrieve(url, file_name) For the image download, I used this: import urllib.request url = "https://orig00.deviantart.net/f170/f/2013/087/e/0/wizards_of_waverly_place_png_by_ivygo-d5zjoqx.png" file_name = "image.png" urllib.request.urlretrieve(url, file_name) And i got a good result, but 7 people instead of 6. However, great work Rajat, thank you. Reply
Pulkit Sharma
Pulkit Sharma says: July 04, 2018 at 4:15 pm
Hi Suryam, The steps have been updated. Please go through them and run the steps again. Reply
vidhun v warrier
vidhun v warrier says: September 10, 2018 at 11:20 am
hai I have completed the whole. It's working perfectly. can u say how can I use in videos rather than in images? Reply
Aishwarya Singh
Aishwarya Singh says: September 12, 2018 at 10:47 am
Hi, You might find this post useful : Calculate screen time of actors in a video Reply
Ponnu says: October 25, 2018 at 6:48 pm
I have completed the whole. It’s working perfectly. I am a beginner, Can u explain what resnet50_coco_best_v2.0.1.h5 contains.... Reply
Pulkit Sharma
Pulkit Sharma says: October 25, 2018 at 8:04 pm
Hi Ponnu, It contains the weights which were obtained while training the resnet50 model on coco dataset. Instead of training the model again for hours, we can use these weights to make predictions. Reply
Nikhil Bhaskar
Nikhil Bhaskar says: November 01, 2018 at 10:42 pm
How can we convert a image classifier model to object detection model with our own coding? Reply
Pulkit Sharma
Pulkit Sharma says: November 02, 2018 at 11:07 am
Hi Nikhil, Yes! you can give the coordinates of the object in the image for training. That will make it an object detection problem instead of classification. Reply
Manisha says: November 03, 2018 at 10:09 am
Hii....i am a student of final year b.tech in computer science..i was wishing to work on a project based on object detection basically cars,roads and buildings...i am a beginner in machine learning...can u plzz help me to give an idea how to start??? Reply
Pulkit Sharma
Pulkit Sharma says: November 03, 2018 at 11:46 am
Hi Manisha, First try to collect some training data, i.e. labeled images having classes of objects as well as their corresponding bounding boxes. Once you have the training data, you can use any of the object detection techniques like Faster RCNN, YOLO, SSD to train your model and get predictions on new images. Reply
Manisha says: November 19, 2018 at 6:20 am
Thank you sir...bt the problem is that as i have no idea of machine lerning.. it's getting really difficult.can you plzz share a small sample of code for illustration??.... Reply
Pulkit Sharma
Pulkit Sharma says: November 19, 2018 at 12:01 pm
Hi Manisha, You can go through these articles to get a better understanding: A step by step introduction to the Basic Object Detection Algorithms (Part-1) A practical implementation of Faster-RCNN algorithm for Object Detection (Part 2 with Python code) Reply
gagan says: November 20, 2018 at 11:14 pm
the instruction given above , mention that copying that downloaded file into working folder working folder ????? what is working folder? and when i run it in jupter notebook gives error : ModuleNotFoundError Traceback (most recent call last) in () ----> 1 from imageai.Detection import ObjectDetection 2 import os 3 4 execution_path = os.getcwd() 5 ~\anaconda\lib\site-packages\imageai\Detection\__init__.py in () ----> 1 import cv2 2 3 from imageai.Detection.keras_retinanet.models.resnet import resnet50_retinanet 4 from imageai.Detection.keras_retinanet.utils.image import read_image_bgr, read_image_array, read_image_stream, preprocess_image, resize_image 5 from imageai.Detection.keras_retinanet.utils.visualization import draw_box, draw_caption ModuleNotFoundError: No module named 'cv2' please tell me what i have to do to correct this Reply
Pulkit Sharma
Pulkit Sharma says: November 28, 2018 at 11:51 am
Hi gagan, The working folder is where your jupyter notebook is. Copy the data in that folder. ModuleNotFoundError: No module named ‘cv2’ To remove this error, you have to install open cv in your system. Reply
Divya says: November 29, 2018 at 8:53 am
Hi Pulkit, I would like to know how a particular image like a fire extinguisher could be detected by using object detection and labelled as risk free or safe. Can you give me an outline on what all things to be done and how to train the model using Haar classifier in openCV? Reply
Pulkit Sharma
Pulkit Sharma says: November 29, 2018 at 11:53 am
Hi Divya, In order to make the model effective to detect fire extinguisher, the model should learn how a fire extinguisher looks like. So, you first have to train the model on fire extinguisher images. Once the model has learned how it looks, then you can pass new images to the model and it will predict whether the image has a fire extinguisher or not. Reply
adrian says: January 07, 2019 at 11:06 pm
I just ran this and am still receiving the following error: ModuleNotFoundError Traceback (most recent call last) in ----> 1 from imageai.Detection import ObjectDetection 2 import os 3 4 execution_path = os.getcwd() 5 ModuleNotFoundError: No module named 'imageai' My image file and the H5 file are both saved in the same directory as my notebook. Reply
Pulkit Sharma
Pulkit Sharma says: January 08, 2019 at 12:00 pm
Hi adrian, Have you followed all the steps given in the article? Also, make sure that you have build the Jupyter Notebook in the same environment which you have created as per the codes given in the article. You have to type 'source activate '(if you follow the exact codes from article type 'source activate retinanet') before launching Jupyter notebook. Reply
michael says: January 08, 2019 at 6:41 pm
Could you tell me which dataset the mentioned picture belongs to, and the input picture should be 768x223 in size? Reply
Pulkit Sharma
Pulkit Sharma says: January 08, 2019 at 7:32 pm
Hi michael, This is just a sample image. It does not belong to any specific dataset. You can also try your own sample image for testing purpose. Reply
How to build a Face Mask Detector using RetinaNet Model! – My Blog
How to build a Face Mask Detector using RetinaNet Model! – My Blog says: August 25, 2020 at 4:49 pm
[…] general, RetinaNet is a good choice to start an object detection project, in particular, if you need to quickly get […] Reply

Leave a Reply Your email address will not be published. Required fields are marked *