Gourav Singh — Updated On April 4th, 2023
Advanced Computer Vision Deep Learning Object Detection Project Python Unstructured Data

This article was published as a part of the Data Science Blogathon

“You can have data without information but you cannot have information without data” – Daniel Keys Moran


If you are here then you might be already interested in Machine Learning or Deep Learning so I need not explain what it is? So starting with a little introduction to the topic, Object Detection is the method of detecting desired objects in images or videos, and in past few years, there were a lot of models that were introduced for the same. YoloV5 is one of those models which is considered one of the fastest and accurate. In this article, I am going to explain how you can train the YoloV5 model on your own data for both GPU and CPU-based systems. I will explain how you will have to set up both GPU or CPU as it can be trained on any of them although GPU is preferred as it would take almost three times more on CPU to train the same, if you don’t have GPU based system and don’t want to train your model on Google Colab then you can still train the model on CPU.

The steps that we are going to discuss in this tutorial are as follows:

Optional: Creating a virtual environment for training.

  1. Cloning repository and Installing dependencies for YoloV5
  2. Create annotations for a custom dataset Using the VIA tool
  3. Convert Annotations to Coco format
  4. Creating YAML file for training
  5. YoloV5 training
  6. YoloV5 Inference

Python 3.6 is recommended for the training.

Let’s start with creating a virtual environment, this step is optional, if you want to install packages in the root environment you can skip this otherwise if you want to keep this setup separate you can follow it along to create a virtual environment. Start with installing virtualenv package which helps in creating the virtual environment.

$ pip install virtualenv

once the package is installed you can go ahead and create the virtual environment for YoloV5 training:

 $ virtualenv yolov5_training_env

the output of the command would look like this:

virtual env
Source: local

after creating an environment you will have to activate the same to install the required packages.


$ source yolov5_training_env/bin/activate


$ yolov5_training_envScriptsactivate
virtual environment windows | YoloV5 Object Detection Model
Source: local


Cloning Repository & Installing Dependencies

To begin with, we need to clone the official YoloV5 repository, if you have git already installed then the following command can be used to clone the repository.

$ git clone https://github.com/ultralytics/yolov5
clone repository and install dependencies
Source: local

otherwise, you can directly download the code from the following website:

YoloV5 Code: https://github.com/ultralytics/yolov5

This yolov5 repository comes up with a “requirements.txt” file which contains all the required libraries for training the model. One thing to remember here is the torch library which would be different for both GPU and CPU environments, so we will have to remove torch and torchvision from the requirements files and install the other libraries. To install other libraries from the “requirements.txt” file following command is used.

$ pip install -r requirements.txt
install requirements | YoloV5 Object Detection Model
Source: local

after installing other libraries we will install torch and torchvision which are the backbone for training.

Torch CPU version installation:

Python package management system (pip) can be used to install torch and torchvision for CPU.

Windows & Linux:

$ pip install torch==1.7.0+cpu torchvision==0.8.0+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html


$ pip install torch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0

Torch GPU version installation: 

Conda is preferable for installing GPU-based torch and torchvision libraries as it already compiles it with CUDA. Also if facing any issue with conda installation you can use pip installation. Before installations make sure you are installing the CUDA compatible torch version which you can find here https://pytorch.org/get-started/previous-versions/.

Windows & Linux:

$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.1 -c pytorch


$ pip install torch==1.7.0+cu101 torchvision==0.8.0+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html


$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 -c pytorch


$ pip install torch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0
install dependencies windows
Source: local

make sure this installation is done right otherwise there would be a lot of compatibility issues.

We need to download the pre-trained weights also in order to use the model for custom data. Weights can be downloaded from the following link https://github.com/ultralytics/yolov5/releases. Once downloaded the weights we need to create a folder “Yolov5/weights” and paste the weights file there. You can download any of the models based on your usecase (recommended: yolov5x.pt or yolov5x6.pt).

Create annotations Using the VIA tool

The heart of training any object detection model is data annotation (preparing data for training). Data Annotation is a technique where we create different geometrical shapes on the objects which we want our object detection model to detect. There are several data annotation tools available but the one which I find easy to use is VIA (VGG Image Annotator) tool. I have made some changes to the tool so that you can use it on your own objects. You can download the tool from here https://drive.google.com/file/d/1rJx0fNgnnhODM7H3GP9RQQ5QEsWdkYEd/view?usp=sharing. This tool is just a simple HTML page that provides you with multiple annotations types like COCO data format, CSV format, etc. Tool page looks like this:

create annotations | YoloV5 Object Detection Model
Source: local

Here if you see, there are 3 buttons which are for objects one, two, and three are actually customizable i.e. you can change those buttons as per your object names,  to do so you will have to open the HTML file in editable mode and find the following lines:

buttons | YoloV5 Object Detection Model
Source: local

these object names you can have as your custom objects like car, person, etc. Once you have your tool ready you need to click on the attribute button you see on the left side panel and create a name attribute that would look something like this:

attribute button
Source: local

once you have done setting up, you can load the images using the “Add Files” button and can start annotations by selecting the object and clicking on the button with the object name accordingly.

load images | YoloV5 Object Detection Model
Source: local

Once all the annotations are done you can export them in CSV format at your desired location.

Convert CSV annotations to COCO format

YoloV5 model requires data to be in coco format:

coco format
Source: local

To convert CSV annotations to COCO format you can use the following code chunk:

Start with importing dependencies to create COCO dataset.

import os
import numpy as np 
import pandas as pd
import shutil as sh
from PIL import Image
from tqdm.auto import tqdm

Once all the dependencies are imported we need to create some required columns for the COCO dataset.

data_path = '/Users/gouravbais//Final_Cleaned_Dataset/'
df = pd.read_csv(data_path+'train/via_region_export.csv')
## create x, y, w, h columns 
x, y, w, h = [], [], [], []
for row in df['region_shape_attributes']:
    row = row.replace('{}', '').replace('}', '')
    row = row.split(',')
## calculating x, y, width and height coordinates
df['x'], df['y'], df['w'], df['h'] = x, y, w, h
## creating a column name image_id having images names as id 
df['image_id'] = [name.split('.')[0] for name in df['filename']]
## creating two columns for storing x and y center values
df['x_center'] = df['x'] + df['w']/2
df['y_center'] = df['y'] + df['h']/2
## define number of classes 
labels = df['region_attributes'].unique()
labels_to_dict = dict(zip(lables, range(0, len(labels))))
print('Lables Directory:', labels_to_dict)
df['classes'] = df['region_attributes']
df.replace({'classes':labels_to_dict}, inplace=True)
df = df[['image_id','x', 'y', 'w', 'h','x_center','y_center','classes']]
## set index of images
index = list(set(df.image_id))

and finally creating the dataset by copying images and creating text files.

source = 'train'
if True:
    for fold in [0]:
        val_index = index[len(index)*fold//5:len(index)*(fold+1)//5]
        for name,mini in tqdm(df.groupby('image_id')):
            if name in val_index:
                path2save = 'val2017/'
                path2save = 'train2017/'
            if not os.path.exists('convertor/fold{}/labels/'.format(fold)+path2save):
            with open('convertor/fold{}/labels/'.format(fold)+path2save+name+".txt", 'w+') as f:
                row = mini[['classes','x_center','y_center','w','h']].astype(float).values
                imagename = data_path+"/{}/{}.jpg".format(source,name)
                check_image_width_height = Image.open(imagename)
                img_width, img_height = check_image_width_height.size
                for r in (row):
                    r[1] = r[1]/img_width
                    r[2] = r[2]/img_height
                    r[3] = r[3]/img_width
                    r[4] = r[4]/img_height
                row = row.astype(str)
                for j in range(len(row)):
                    print(row[j], 'n')
                    row[j][0] = str(int(float(row[j][0])))
                    text = ' '.join(row[j])
            if not os.path.exists('convertor/fold{}/images/{}'.format(fold,path2save)):

The following code would generate a folder named “converter/fold0” which contains Images and Labels folders that you need to copy, and paste inside cloned YoloV5 repository “YoloV5/coco/” folder.

Creating YAML file for training

Once dependencies are installed and data is ready now we need to configure a YAML file to train the model. For that, you open any text editor and create a file “coco.yml” inside the folder “YoloV5/data/”. The content of the file would be like this:

# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: /Users/gouravbais/yolov5/coco/images/train2017
val: /Users/gouravbais/yolov5/coco/images/val2017
# number of classes
nc: 3


# class names
names: [‘object1’, ‘object2’, ‘object3’]

please keep the classes in the same order that you get while converting the CSV dataset to COCO dataset.

Yolov5 training

Setup of the YAML file leaves nothing much to do afterwards, now we just need to write the command and provide it with the data and weights to train the model on our custom dataset. You need to traverse to the YoloV5 folder through terminal or anaconda prompt and run the following command:

$ python train.py --img 416 --batch 12 --epochs 50 --data ./data/coco.yml --weights ./weights/yolov5x.pt

Command explanation:

train.py: python file containing the training code.

img: image size defaulted to 640, depending on the GPU or CPU memory you can change it.

batch: batch size which is again directly dependent on your memory.

data: the path of your YAML file.

weights: the path of weights file that we have downloaded at step 1.

Once you run the mentioned command the training of the model would start. The training file that we are using has the code that checks for GPU if it’s available then training takes place on GPU otherwise training continues on CPU. We need not explicitly code to use GPU or CPU.

model training | YoloV5 Object Detection Model
Source: local

Once training is completed you can find the trained weights in the “YoloV5/runs/train” folder, which would have two weights file “best.pt” and “last.pt” you can use either of them based on your training (“best.pt” recommended).  You can copy the file and paste it inside the “YoloV5/weights” folder. One thing that you will have to observe here is that if the loss is still decreasing after 50 epochs you can run it for more epochs depending on your data.

YoloV5 Inferencing

In conclusion, what you need is a Deep Learning model that works on your testing data. So to test your model on testing data you will have to use the “YoloV5/detect.py” script present at the same location as “train.py”. Command to test the model on your data is as follows:

$ python detect.py --img 416 --source ./detect/test_data --weights ./weights/best.pt --conf-thres 0.4

Command explanation:

detect.py: python file for inferencing.

img: image size which must be the same as training image size.

source: the path of testing data or testing image file.

weights: the path of trained weights file.

conf-thres: confidence threshold above which object must be detected.

Running the above command would create a folder “YoloV5/runs/detect” which would have all the resultant files on which you have run the model.

Now you are ready to apply this knowledge of Object Detection to your use case.

Thanks for reading this article do like if you have learned something new, feel free to comment See you next time !!! ❤️ 

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

About the Author

Gourav Singh

Applied Machine Learning Engineer skilled in Computer Vision/Deep Learning Pipeline Development, creating machine learning models, retraining systems and transforming data science prototypes to production-grade solutions. Consistently optimizes and improves real-time systems by evaluating strategies and testing on real world scenarios.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *