Selecting the Right Bounding Box Using Non-Max Suppression (with implementation)

Aishwarya Singh 20 Feb, 2024 • 8 min read

Overview

Understand the concept of Non-Max Suppression.
Learn how object detection algorithms use Non-Max Suppression.
Implement non-max suppression using NMS function in PyTorch.

Introduction

A blogger who loves exploring the endless world of data science and artificial intelligence. Computer vision is one of the most glaring fields in data science. Like any other field of data science like machine learning, the applications of this field have also become a part of our personal lives. For example, image classification, pose estimation, object detection, etc are some of its applications and we are all surrounded by them.

I was recently studying algorithms for object detection and I came across a very interesting idea that almost all of these algorithms use – Non-Max Suppression Algorithm (or NMS).

Non-max suppression is the final step of these object detection algorithms and is used to select the most appropriate bounding box for the object.

In this tutorial, I will introduce the concept of non-max suppression, why it is used, and explain how it works in object detection algorithms.

Introduction to Object Detection

Object detection is one of the branches of computer vision and is widely in use in the industry. For example, Facebook uses it to detect faces in images uploaded, our phones use object detection to enable the “face unlock” systems. Ground truth plays a crucial role in evaluating the accuracy of object detection models. Object detection involves the following two tasks –

Locating the object in the image
Classifying the object in the image

The following image below will help you understand the same.

In the first image, we are only ‘classifying’ the object in the image. This is a classification problem
For the second image, we are only ‘locating’ the object in the image. This is a localization problem
In the third image, we ‘classify and locate’ the object. This is an object detection problem

So I hope you have a basic understanding of the concept of object detection. In case you want to study object detection in detail, you can read the following blogs-

There are various algorithms for object detection tasks and these algorithms have evolved in the last decade. To improve the performance further, and capture objects of different shapes and sizes, the algorithms predict multiple bounding boxes, of different sizes and aspect ratios.

But of all the bounding boxes, how is the most appropriate and accurate bounding box selected? This is where NMS comes into the picture.

Refer to this article – 5 Exciting Computer Vision Applications With Relevant Datasets!

What is Non-max Suppression?

The objects in the image can be of different sizes and shapes, and to capture each of these perfectly, the object detection algorithms create multiple bounding boxes. (left image). Ideally, for each object in the image, we must have a single bounding box. Something like the image on the right.

To select the best bounding box, from the multiple predicted bounding boxes, these object detection algorithms use non-max suppression. This technique is used to “suppress” the less likely bounding boxes and keep only the best one.

So we now understand why do we need NMS and what is it used for. Let us now understand how exactly is the concept implemented.

Also Read: List append() Method in Python Explained with Examples

How Does Non-max Suppression Work?

The purpose of non-max suppression is to select the best bounding box for an object and reject or “suppress” all other bounding boxes. The algorithm iteratively selects the best bounding box, compares overlaps, and removes redundant boxes until convergence.The NMS takes two things into account

The objectiveness score is given by the model
The overlap or IOU of the bounding boxes

You can see the image below, along with the bounding boxes, the model returns an objectiveness score. This score denotes how certain the model is, that the desired object is present in this bounding box.

You can see all the bounding boxes have the object, but only the green bounding box one is the best bounding box for detecting the object. Now how can we get rid of the other bounding boxes?

The non-max suppression will first select the bounding box with the highest objectiveness score. And then remove all the other boxes with high overlap. So here, in the above image,

We will select the Green bounding box for the dog (since it has the highest objectiveness score of 98%)
And remove yellow and red boxes for the dog (because they have a high overlap with the green box)

The same process goes for the remaining boxes. This process runs iteratively until there is no more reduction of boxes. In the end, we will be left with the following result.

That’s it. That’s how NMS works. To solidify our understanding, let’s write a pseudo code to implement non-max suppression.

Pseudo code for non-max Suppression?

By now you would have a good understanding of non-max suppression. Let us break down the process of non-max suppression into steps.

Suppose you built an object detection model to detect the following – Dog or Person. This object detection mode has given the following set of bounding boxes along with the objectiveness scores.

The following is the process of selecting the best bounding box using NMS-

Select the box with highest objectiveness score
Then, compare the overlap (intersection over union) of this box with other boxes
Remove the bounding boxes with overlap (intersection over union) >50%
Then, move to the next highest objectiveness score
Finally, repeat steps 2-4

For our example, this loop will run twice. The below images show the output after different steps.

Also Read: Image Segmentation | Types Of Image Segmentation

Implementing non-max Suppression

Now that you have a good understanding of non-max suppression and how it works, let us look at a simple implementation of the same. Let us say that we have the same image of person and dog (which we have been using in the previous section) with six bounding boxes and the objectiveness score for each of these bounding boxes.

Let us load the image and plot all six bounding boxes.

Output:

For this image, we are going to use the non-max suppression(NMS Algorithm) function nms() from the torchvision library. This function requires three parameters-

Boxes: bounding box coordinates in the x1, y1, x2, y2 format
Scores: Objectiveness score for each bounding box
iou_threshold: the threshold for the overlap (or IOU)

Here, since the above coordinates are in x1, y1, width, height format, we will determine the x2, y2 in the following manner-

x2 = x1 + width
y2 = y1 + height

Output:

tensor([1, 4])

So this functions returns the list of bounding box/boxes to keep as an output, in the decreasing order of objectiveness score. Since I have set a very low threshold, the output has only two boxes. But if you set a higher threshold value, you will get more number of bounding boxes. In that case, you can then select the top n bounding boxes (where n should be the number of objects in your image).

For our example, this python function has returned the bounding box 1 and 4. Let us plot these on the image to see the final results.

Great! So we have our best bounding boxes for each of the object in the image. Now this is a very useful technique and is implemented in most of the object detection algorithms. Let us have a look at some of them in the next section.

Also Read: Everything you need to Know about Linear Regression!

Algorithms that Use Non-max Suppression?

Almost all object detection algorithms use this technique to get the best bounding boxes from the predicted bounding box. Metrics are employed to assess the performance of object detection algorithms following is the screenshot of the SSD (Single Shot Detector) architecture taken from the research paper.

You can see that at the final step, SSD has 8732 predicted bounding boxes. Further, after these predictions, SSD uses the non-max suppression technique to select the best bounding box for each object in the image.

Similar to SSD, YOLO (You Only Look Once) also uses non-max suppression at the final step. Multiple bounding boxes are predicted to accommodate objects of different sizes and aspect ratios. YOLOv8 is a notable object detection algorithm utilizing non-max suppression for post-processing. Further, from these predictions, NMS selects the best bounding box.

Conclusion

To summarize, The article thoroughly explores the implementation of Non-Max Suppression (NMS) in object detection algorithms using PyTorch, with a particular focus on convolutional neural networks. It introduces the fundamentals of object detection and underscores the necessity of NMS in handling diverse bounding box predictions. The discussion encompasses the pseudo code for NMS, its practical implementation in PyTorch through the nms function, and its significance in popular algorithms like SSD and YOLOv3.

The article also integrates keywords such as convolutional, bbox, confidence score, yolov3, anchor boxes, confidence threshold, and object detector. It emphasizes the iterative process of NMS, including comparing objectiveness scores and removing redundant boxes, while also acknowledging the role of opencv in reading images. The comprehensive summary provides insights into the practical aspects of NMS and its relevance in the context of convolutional neural networks and object detection tasks.

I hope this article gave you a good understanding of the topic. In case you have any suggestions/ideas, feel free to share them in the comment section.

Frequently Asked Questions

Q1. What is non-maximum suppression?

A. Non-max suppression is a technique used in object detection to choose the best bounding box by considering the object score and overlap.

Q2. Does YOLO use non-max suppression?

A. Yes, YOLO (You Only Look Once) uses non-maximum suppression to select accurate bounding boxes.

Q3. How does NMS work in YOLO?

A. NMS in YOLO selects the highest-scoring box, compares overlaps, and iteratively removes redundant boxes until convergence.

Q4. What are some challenges of object detection?

A. Challenges of object detection include scale, occlusion, viewpoint, deformable objects, cluttered backgrounds, limited data, real-time processing, class imbalance, and lighting conditions.

Q5. Which neural network architectures use non-max suppression?

A. Object detection algorithms like SSD (Single Shot Detector) and YOLO utilize non-max suppression to refine predicted bounding boxes.

Aishwarya Singh 20 Feb 2024

An avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. Fascinated by the limitless applications of ML and AI; eager to learn and discover the depths of data science.

Computer Vision Deep Learning Image Intermediate

Frequently Asked Questions

Responses From Readers

RS 03 Aug, 2020

Please note several hyperlinks in the article gives error: "Page not found" Please check it out

1

Show 1 reply

Aishwarya Singh 04 Aug, 2020

Thanks for pointing out. I have updated the same.

Pushkar 13 Aug, 2020

What are the applications of NMS? If you could list some.

1

Show 1 reply

Aishwarya Singh 17 Aug, 2020

Hi Pushkar, NMS is used for the selection of best bounding box when multiple boxes are predicted for the same object. If you explore the RCNN, YOLO or SSD algorithms, you will see that the NMS is used at the last stage for final selection.