Ultimate Guide: Building a Mask R-CNN Model for Detecting Car Damage (with Python codes)

Last Updated : 27 Aug, 2021

7 min read

Introduction

The applications of computer vision continue to amaze. From detecting objects in a video, to counting the number of people in a crowd, there is no challenge that computer vision seemingly cannot overcome.

One of the more intriguing applications of computer vision is identifying pixels in a scene and using them for diverse and remarkably useful purposes. We will be taking up one such application in this article, and trying to understand how it works using Python!

The aim of this post is to build a custom Mask R-CNN model that can detect the area of damage on a car (see the image example above). The rationale for such a model is that it can be used by insurance companies for faster processing of claims if users can upload pics and they can assess damage from them. This model can also be used by lenders if they are underwriting a car loan especially for a used car.

You can read an in-depth explanation of Mask R-CNN and how it works in more detail here.

Sponsored: Check out our Computer Vision using Deep Learning Course for creating Computer Vision applications

What is Mask R-CNN?
How Mask R-CNN Works
How to build a Mask R-CNN for Car Damage Detection
- Collecting Data
- Annotating the Data
- Training a Model
- Validating the Model
- Run the Model of Images and Make Predictions

What is Mask R-CNN?

Mask R-CNN is an instance segmentation model that allows us to identify pixel wise location for our class. “Instance segmentation” means segmenting individual objects within a scene, regardless of whether they are of the same type — i.e, identifying individual cars, persons, etc. Check out the below GIF of a Mask-RCNN model trained on the COCO dataset. As you can see, we can identify pixel locations for cars, persons, fruits, etc.

Mask R-CNN is different from classical object detection models like Faster R-CNN where, in addition to identifying the class and its bounding box location, it can also color pixels in the bounding box that correspond to that class. When do you think we would be need this additional detail? Some examples I can think of are:

Self-Driving Cars need to know the exact pixel location of the road; potentially of other cars as well to avoid collisions
Robots may need pixel location of objects that they want to pick up (Amazon’s drones comes to mind here)

The easiest way to try a Mask R-CNN model built on COCO classes is to use the Tensorflow Object Detection API. You can refer to this article (written by me) that has information on how to use the API and run the model on YouTube videos.

How Mask R-CNN works

Before we build a Mask R-CNN model, let’s first understand how it actually works.

A good way to think about Mask R-CNN is that it is a combination of a Faster R-CNN that does object detection (class + bounding box) and FCN (Fully Convolutional Network) that does pixel wise boundary. See figure below:

Mask RCNN is a combination of Faster RCNN and FCN

Mask R-CNN is conceptually simple: Faster R-CNN has two outputs for each candidate object, a class label and a bounding-box offset; to this we add a third branch that outputs the object mask — which is a binary mask that indicates the pixels where the object is in the bounding box. But the additional mask output is distinct from the class and box outputs, requiring extraction of much finer spatial layout of an object. To do this Mask R-CNN uses the Fully Convolution Network (FCN) described below.

FCN is a popular algorithm for doing semantic segmentation. This model uses various blocks of convolution and max pool layers to first decompress an image to 1/32th of its original size. It then makes a class prediction at this level of granularity. Finally it uses up sampling and deconvolution layers to resize the image to its original dimensions.

So, in short, we can say that Mask R-CNN combines the two networks — Faster R-CNN and FCN in one mega architecture. The loss function for the model is the total loss in doing classification, generating bounding box and generating the mask.

Mask RCNN has a couple of additional improvements that make it much more accurate than FCN. You can read more about them in their paper.

How to build a Mask R-CNN Model for Car Damage Detection

For building a custom Mask R-CNN, we will leverage the Matterport Github repository. The latest TensorFlow Object Detection repository also provides the option to build Mask R-CNN. However I would only recommend this for the strong-hearted! The versions of TensorFlow, object detection, format for mask, etc. can demand debugging of errors. I was able to successfully train a Mask R-CNN using it.

But I have seen many people struggle with all kinds of errors. So I now highly recommend the Matterport Mask R-CNN repository to anyone venturing into this domain.

Collecting Data

For this exercise, I collected 66 images (50 train and 16 validation) of damaged cars from Google. Check out some examples below.

Annotating the Data

A Mask R-CNN model requires the user to annotate the images and identify the region of damage. The annotation tool I used is the VGG Image Annotator — v 1.0.6. You can use the html version available at this link. Using this tool you can create a polygon mask as shown below:

Once you have created all the annotations, you can download the annotation and save it in a json format. You can look at my images and annotations on my repository here.

Training a model

Now we start the interesting work of actually training the model! Start by cloning the ‘Matterport Mask R-CNN’ repository— https://github.com/matterport/Mask_RCNN.

Next we will load our images and annotations.

class CustomDataset(utils.Dataset):

def load_custom(self, dataset_dir, subset):
"""Load a subset of the Balloon dataset.
dataset_dir: Root directory of the dataset.
subset: Subset to load: train or val
"""
# Add classes. We have only one class to add.
self.add_class("damage", 1, "damage")

# Train or validation dataset?
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)

# We mostly care about the x and y coordinates of each region
annotations1 = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
annotations = list(annotations1.values()) # don't need the dict keys

# The VIA tool saves images in the JSON even if they don't have any
# annotations. Skip unannotated images.
annotations = [a for a in annotations if a['regions']]

# Add images
for a in annotations:
# Get the x, y coordinaets of points of the polygons that make up
# the outline of each object instance. There are stores in the
# shape_attributes (see json format above)
polygons = [r['shape_attributes'] for r in a['regions'].values()]

# load_mask() needs the image size to convert polygons to masks.
image_path = os.path.join(dataset_dir, a['filename'])
image = skimage.io.imread(image_path)
height, width = image.shape[:2]

self.add_image(
"damage", ## for a single class just add the name here
image_id=a['filename'], # use file name as a unique image id
path=image_path,
width=width, height=height,
polygons=polygons)

I have used the balloon.py file shared by Matterport and modified it to create a custom code that loads images and annotations and adds them to a CustomDataset class. Check out the entire code here. Follow the same code block and update it for any specifics for your class. Please note that this code only works for one class.

Further, you can use this notebook to visualize the mask on the given images. See an example of this below:

To train the model, we use the COCO trained model as the checkpoint to perform transfer learning. You can download this model from the Matterport repository as well.

To train the model, run the below code block:

## Train a new model starting from pre-trained COCO weights
python3 custom.py train --dataset=/path/to/datasetfolder --weights=coco

## Resume training a model that you had trained earlier
python3 custom.py train --dataset=/path/to/datasetfolder --weights=last

I am using a GPU and trained the model for 10 epochs in 20–30 minutes.

Validate your model

You can inspect the model weights using the notebook — Inspect Custom Weights. Please link your last checkpoint in this notebook. This notebook can help you perform a sanity check if your weights and biases are properly distributed. See a sample output below:

Run model on images and make predictions

Use the notebook inspect_custom_model to run model on images from test/val set and see model predictions. See a sample result below:

And there you have it! You just built a Mask R-CNN model to detect damage on a car. What an awesome way to learn deep learning.

End Notes

Mask-RCNN is the next evolution of object detection models which allow detection with better precision. A big thanks to Matterport for making their repository public and allowing us to leverage it to build custom models. This is just a small example of what we can accomplish with this wonderful model.

If you have any questions, or feedback for me on this article, please share it using the comments section below.

References

Mask RCNN Paper
Understand difference b/w instance segmentation and semantic segmentation
Very good explanation of Mask RCNN
Good blog from Matterport on training on custom dataset
Matterport Github repo
Fully Convolutional Network

About the Author

Priya Dwivedi – President, Deep Learning Analytical Solutions

Priya Dwivedi is a graduate of IIT Delhi. She has 10+ years experience working as a data scientist. She currently runs her own deep learning analytics consultancy (http://www.deeplearninganalytics.org/) that works with businesses to build and implement deep learning models for them. Please reach out to her at [email protected] if you would like to collaborate with her on a project.

Advanced Algorithm Computer Vision Deep Learning Image

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

kern

that's a cool application Priya. Thanks for sharing that, I'll need some time to digest all of that.

Srikanth

Helllo Priya, The post is really cool. I am trying to just draw a bounding box over the damaged area and possibly try to crop it out from the image and also name the part of the car it is. Is this possible ? I am actually trying to build a system that can quantitatively tell out how much a car is damaged from the provided image and also the parts that are damaged. Please do suggest me a good way to do it. Possibly any reference links available too. Thank you

Karan Purohit

Thanks for the nice blog! I ran your model. so after training when I run inspect_custom_model notebook it gives error: OSError: Unable to open file (unable to open file: name = 'mask_rcnn_damage_0010.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0) I cant find saved weights file.

sset

Thanks for article. Execution of detection or prediction on unseen dataset yields no bounding mask. However if I take sample from train/validation set for some images bounds are detected. What could be reason - is it overfitting or lack of training samples or anchors needs to be adjusted?

Pitter

Helllo Priya, The post is really cool. Thanks for the nice blog What is the difference between cnn and rcnn and mask rcnn for given problem

Sruti Bhattacharjee

Hey , can you help me personally with this project? actually i am dealing with same with my own dataset for 4 months. It will be really helpful for me if you can tell me through email. thanks.

hello nice results.. do you know how to run evaluation in balloon.py like coco.py? thanks

Dinesh 92x

Very helpful and nice article.. .. But one question.. Why validation data have JSON file. It is for the testing not for training.? right..

Ram

Perhaps you can understand easily if you start from CNN, R-CNN, Fast R-CNN, Faster R-CNN as Mask R-CNN is Faster R-CNN+FCN Pls go through below blog post covering them https://www.analyticsvidhya.com/blog/2018/10/a-step-by-step-introduction-to-the-basic-object-detection-algorithms-part-1/

Somesh Sunariwal

please make a video of this full article so understandability of concepts is easy... And please provide a YouTube link This article is very Helpfull for me. Thank you

Hi, This is a great example and thanks for working and sharing this. Can I run the model against a streaming video (of damaged car images stitched into an .mp4, for example) so that the damages get marked (ie. boxed) instantly - something similar to the 2nd video image you have shown above in this article. I'm thinking there should be a way to store/convert the model into a .pb and label.pbtxt and use that to do this live detection ? Any help on this would be greatly appreciated. Thanks again !

saisriteja

Hello Priya, The post is really good. I tried to train with my custom data. I dont have a GPU to train this... I dont know how to work with collabs and kaggle.....can anyone help me with this....

rama subba reddy

can we have different classes like dent ,scracth and other classes to classify accordingly..

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Model Deployment

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Zero and Few Shot Learning

Ultimate Guide: Building a Mask R-CNN Model for Detecting Car Damage (with Python codes)

Introduction

Table of Contents

What is Mask R-CNN?

How Mask R-CNN works

How to build a Mask R-CNN Model for Car Damage Detection

Collecting Data

Annotating the Data

Training a model

Validate your model

Run model on images and make predictions

End Notes

References

About the Author

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics