Alleviation of COVID by means of Social Distancing & Face Mask Detection Using YOLO V4

Yash Indulkar 27 Aug, 2021 • 10 min read

This article was published as a part of the Data Science Blogathon.

Abstract

This topic consists of social distancing & face mask detection for the events of coronavirus, alleviation in such pandemic can be solved by social distancing as well as putting on its face mask. The Covid-19 had a huge impact on different sectors in many countries and such impact caused problems to many people around the world. This small step of wearing a face mask as well as following social distancing would save lots of lives as the spread of the virus could be mitigated.

YOLO stands for You Only Look Once, this algorithm is used for Object Detection as well as Object Tracking, this research uses YOLO for calculating the social distancing & identifying face mask on people’s face with the help of Object Detection, whereas tracking the face and people in the frame for counting the objects and keeping a record of that object in the next frame is done by Object Tracking. The minimum distance to keep while adhering to social distancing is 6 Feet, keeping this as the base for calculating distance, the model was trained and used for object detection as well as object tracking.

There are different types of algorithms available, YOLO stands out from all the other present currently. The custom datasets were used to understand facemasks and were trained on those datasets for detection and tracking. For evaluation of the trained model, mAP (Mean Average Precision) was calculated for both the use cases (Social Distancing & Face Mask Detection), it works by comparing the ground-truth bounding box vs the detected box and, in the end, returns the score. The higher the mAP score would be, the better model is in the detection of objects

INTRODUCTION

Computer Vision is the subset of Artificial Intelligence that uses the computer’s power to extract meaningful information from the provided datasets, those datasets can be Images, Videos, etc. This use of computer vision can be extended to many other applications depending upon the use cases. Artificial Intelligence can be stated as the shed which covers the aspects such as Machine Learning, Deep Learning & Computer Vision.

This research-based on Face Mask Detection & Social Distancing uses computer vision to understand various aspects of the images or videos based on frames that would be provided as an input to the algorithms. The basic concept behind this is to find the bounding boxes related to the classes, the classes could be anything that would range from a Dog to Car depending on the training datasets.

Coronavirus had a great impact on various sectors of the world that be Industrial or Transportation or Agriculture, this impact caused the world to stop every sector and ordered to follow strict restrictions for following social distancing & wearing a face mask on a priority basis. This impact of Covid-19 on different sectors can be observed in Fig 1 below.

pie chart — Fig 1. Pie-Chart for Impact Distribution- Image by Author

It can be observed that the highest impact was done on the Restaurants sectors with the percentage of 20 % been the highest, followed by Real Estate (16 %) been the second-highest among others. Similarly, it can be observed that the lowest impact done by Covid-19 was on the Agriculture sector (3 %) respectively. Total cases of Coronavirus globally can be observed from Fig 2, which shows the graph of people affected by Covid based on timestamp.

Total Covid Cases Globally Yolo V4 — Fig 2. Total Covid Cases Globally- Image by Author

It can be observed that the cases started around 22^nd Jan 2020 and the graph was exponentially increasing day by day, from 0 cases to around 111 million cases by 9^th Feb 2021. This rise of Covid was impacting all countries with different figures on an individual level, such huge numbers were devastating and caused this epidemic transition to the pandemic.

METHODOLOGY

This part of the topic highlights the algorithm used for object detection as well as object tracking.

1. YOLO Architecture

The Yolo algorithm stands for You Only Look Once, this algorithm is a state of art, which works on a real-time system, build on deep learning for solving various Object Detection as well as Object Tracking problems. The architecture of Yolo can be observed from the below Fig 3.

YOLO Architecture V4 — Fig 3. YOLO Architecture- Image by Author

It can be observed from the above figure that the architecture contains the Input image layers which are responsible for taking the inputs that would be passed to further layers, input can be any image depending upon the use cases. Along the input layer comes the DarkNet Architecture, this is an open-source neural network for which framework is created with the help of C & CUDA, this framework features YOLO for object detection & object tracking.

Further, the architecture consists of the flattened layer which is densely connected with the convolutional layer which is also densely connected to pass the data from each node to other nodes in the architecture, similarly, this is passed to the output layer which gives 4-part values, those 4 parts describe the predicted value for the bounding box, denoted by x, y, w, h, along with the object detection score plus the probability of the predicted class. This YOLO is part of the One-Shot object detector family which is accurate & fast, there is also a Two-Shot object detector.

Two-Shot object detectors which are popular are R-CNN, Fast R-CNN, and Faster R-CNN, these algorithms are accurate in obtaining the results based on certain use cases but are slow as compared to that of Yolo, You Only Look Once is an algorithm that looks at the image at a single glance and based on that look predicts the bounding boxes related to certain classes, classes can be anything ranging from Dog to Car, or Gun to Tanks, this special feature makes Yolo stand out from others. Different types of object detectors based on a shot can be observed in Fig 4 below.

Yolo v4 fig4

Fig 4. Different Types of Detector- Image by Author

From the above figure, we can find out different components, there are 4 different types of components

Input The input to the detector can be an image or video based on the use cases specified in the research.

Backbone The backbone of the object detector contains models, these models can be ResNet, DenseNet, VGG.

Neck The neck in the detector acts as an extra layer, which goes in parallel to the backbone & the head.

Head The head is the network that is in charge of the detection of objects based on bounding boxes.

EXPERIMENTAL RESULTS

The experimental results section for this project details the results obtained after doing various observations and forming final outputs. This project focuses on social distancing detection & face mask detection for the events of Covid-19, Fig 5 explains the architecture for calculating the distance between objects and shows the flow of how the output is getting generated with the use of Yolo Version 4.

Fig 5. YOLO Darknet Architecture- Image by Author

The below Fig 6 is the architecture for the analysis of face masks on objects, the objects over here is the person on which the detection is performed with the help of custom datasets. The custom dataset is trained for 3 different categories (Good, None & Bad) depending upon the annotations provided, it bounds the boxes with respective classes. The difference between object detection and object tracking is the use of a tracker (in Yolo DeepSort) which helps in keeping a track of an object by assigning an Id.

YOLO V4 Deepsort Architecture — Fig 6. YOLO Deepsort Architecture- Image by Author

Below are the examples of what datasets have been used for training purposes. It can be observed from Fig 7, which shows the detection for a person based on the COCO dataset, this dataset contains a large number of classes ranging from Cat to Car to Person and so on.

yolo v4 fig 7

Fig 7. COCO Dataset Sample- Image by Author

Similarly, Fig 8 below shows the custom dataset used for Face Mask Detection, this custom dataset contains 600 Images with annotations made for every object present in the frame. The need for creating a custom dataset was because the COCO dataset doesn’t contain classes for face mask detection.

YOLO V4 Custom Dataset Sample

Based on the above figure, the annotation was created for different classes present in the frame, it can be observed from Fig 9, it contains 2 different classes (0 & 2). The classes use for face mask detection are 0 for Good, 1 for None & 2 for Bad respectively.

Annotations for Objects based on Images Yolo V4 — Fig 9. Annotations for Objects based on Images- Image by Author

Similarly, the other annotation file was created based on Person Object Detection for creating bounding boxes based on objects detected in the frame. It can be observed from Fig 10 below, which contains a single class (0 for Person), the output goal for social distancing is to detect the person in a frame, and based on the distance between the other object, the measurement is calculated. For calculating the distance between objects, the Euclidean Distance formula is used.

Annotations of Objects based on Images Yolo V4 — Fig 10. Annotations of Objects based on Images- Images by Author

Below is the training graph plotted for the training of custom dataset, the custom dataset used in this research is related to face mask, the epoch for which it was trained is 4000 Epochs, it can be observed from Fig 11, the loss vs the epochs were getting reduced after 1200 Epochs and remained constant throughout the last epoch, this explains that the training loss was minimized till 1200 and thereafter it was constant, which means that the training epochs should’ve been set around 2000, because the more number of iterations present in training the data, the more computing power is needed for performing.

Fig 11. Iteration Graph for Training Custom Data- Image by Author

The results related to the research based on social distancing are shown in Fig 12, the results are grid into 2 images, the left side of the image indicates the output with respective bounding boxes based on distance calculation.

Similarly, the project was carried on Face Mask detection has the result in below Fig 13 which shows the objects detected with bounding boxes respectively, the goal is the detect if the object (Face) is wearing a Mask or not, based on that it created a bounding box with different color and displays the class name associated to it. The color used is Green for No Mask & Purple for Mask or None.

fig 13 Face Mask Detection on Crowded Place

Another example related to face mask detection using Darknet is shown in below Fig 14, the implementation of the darknet is based on Object Detection without tracking the objects throughout different frames, it can be observed that the model detected objects wearing No Mask, still assigned some objects with Good & Bad, also an object with miss classification for No Mask was assigned with Good category, these False Positive results will be explained in Table II.

Fig 14

Fig 14. Another Example for Face Mask Detection- Image by Author

Finally, to evaluate the training of the model based on the dataset provided was done with mAP (Mean Average Precision), it is based on the calculations for Mean Average Precision over all the calculation based on the classes present in the training data & the overall IoU (Intersection Over Union) threshold, the below Table I shows the Average Precision for each category and the values obtained by True Positive & False Positive. The percentage of the threshold for which the AP was calculated as 0.25 % with 101 Recall Points.

From the above table, it can be observed that the percentage of classes for AP were above 90 % for each category.

CONCLUSION

The study of this research was to understand the social distancing & face mask detection for the events of Covid-19, the object detection for social distancing was based on persons & face mask detection was based on faces, which was done by using Yolo. The Yolo v4 for object detection was carried out by Darknet & object tracking was carried out by Deepsort.

Final calculations for how better the model was working for predictions of the object were done by calculating mAP, which showed that for a threshold of 0.25 % the average precision was around 90 % & above, for the threshold of 0.50 % the average precision was around 88 % & above. Similarly, the outputs for social distancing were carried out on different datasets of videos, to increase the complexity for detection, crowded places were also taken into consideration.

The face mask tracking model showed the percentage accuracy for each object detected. This could be carried out in bigger industries with real-time detection, for which higher computational power would require.

About the Author

My name is Yash Indulkar, Completed my under graduation from Thakur College of Science & Commerce (TCSC), my research areas are Convolutional Neural Networks, Bayesian Deep Learning, Computational Linguistics on the theoretical sides. Also Natural Language Processing, Object Detection as well as Object Tracking on the Application Side.