Hand landmarks detection on an image using Mediapipe

Aman Preet Gulati 01 Mar, 2022

6 min read

This article was published as a part of the Data Science Blogathon.

Overview

In this article, we will be making hands landmarks detection model with the profound library i.e. mediapipe as the base library, and for other computer vision preprocessing CV2 libraries. There are many use cases in the market for this problem statement whether it’s for business-related virtual reality or in the gaming section for the real-time experience.

Industry use cases

Smart homes: This is one of the modern use cases of computer vision and people are leveraged to live a more comfortable life and that’s why it’s no longer the niche area it is spreading to the common households as well.
Smart TVs: We have often seen this use case wherewith your hand moments one can change the volume, change the channels, and whatnot.
Games: For the real experience, this technology is getting more into interactive gaming.

Let’s build our hand detection model

Import the Libraries

Here we will import all the libraries that will be required in the whole pipeline.

import cv2
import numpy as np
import mediapipe as mp
import matplotlib.pyplot as plt

Initializing the hand’s landmarks detection model using Mediapipe

Whenever we talk about the detection whether it is an object, person, animal, or as in our case hands detection then the very first step is to initialize the model with valid parameters no matter what detection technique we are following it can either be Mediapipe or Yolo but initializing the model is important, following the same principle we will be following all the given steps:

# First step is to initialize the Hands class an store it in a variable
mp_hands = mp.solutions.hands

# Now second step is to set the hands function which will hold the landmarks points
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2, min_detection_confidence=0.3)

# Last step is to set up the drawing function of hands landmarks on the image
mp_drawing = mp.solutions.drawing_utils

Code-breakdown:

Firstly initializing the class of hands by mp.solutions.hands with a variable.
Then using the same variable setting up the function for hands by mp.solutions.hands.Hands().

By far we understood the structure of hands model initialization now let’s deep dive into the arguments used in hands the function.

static_image_mode: This argument takes up the boolean value as its valid value i.e. it can be either True or False. the by default condition remains the False condition which remains for the video streaming when I say video streaming then that means it results in the lower latency of processing i.e. it keeps on focusing on particular hands and localizes the same hands until it loses the track of that particular hand which can be beneficial when we have to detect the hands in live stream or videos but according to our requirement we have to detect the landmarks on image hence we will set the value to True.

max_num_hands: This argument will indicate the maximum number of hands that the model will detect at one instance. By default, the value is 2 which also make sense that at least we will want a pair of hand to be detected though we can change that for sure.

min_detection_confidence: This argument provides us the flexibility that how much rigidity we want from our detection model and in that case it provides the threshold value of the level of confidence. The ideal range of minimum detection confidence is [0.0,1.0] and by default, it remains 0.5 which means that if the confidence level drops below 50% then the hands will not be detected at all in the output image.

At last, we will be using the mp.solutions.drawing_utils which will be responsible to draw all the hand’s landmarks points on the output image which were detected by our Hands function.

Read an Image

Here we will be first use the cv2.imread() to read the image on which hands detection is to be performed and matplotlib library to show that particular input image.

# Reading the sample image on which we will perform the detection
sample_img = cv2.imread('media/sample.jpg')

# Here we are specifing the size of the figure i.e. 10 -height; 10- width.
plt.figure(figsize = [10, 10])

# Here we will display the sample image as the output.
plt.title("Sample Image");plt.axis('off');plt.imshow(sample_img[:,:,::-1]);plt.show()

Output:

Perform Hands Landmarks Detection

So as we have initialized our hand detection model now our next step will be to process the hand landmarks detection on the input image and draw all the 21 landmarks points on that image with the above-initialized model for doing that we will be going through the following steps.

results = hands.process(cv2.cvtColor(sample_img, cv2.COLOR_BGR2RGB))

if results.multi_hand_landmarks:
    
   for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
        print(f'HAND NUMBER: {hand_no+1}')
        print('-----------------------')
        
        for i in range(2):
            print(f'{mp_hands.HandLandmark(i).name}:')
            print(f'{hand_landmarks.landmark[mp_hands.HandLandmark(i).value]}')

Output:

Code-breakdown:

In the very first step, we are using the process function from the Mediapipe library to store the hand landmarks detection results in the variable along with that we have converted the image from the BGR format to the RGB format.
While coming to the next step we will first check for some validation that whether the points are detected or not i.e. the variable should have some results in it.
If yes, then we will loop through all the points that were detected in the image which has the hand’s landmarks points.
Now in the other loop, we can see that there are only 2 iterations because we only want to show 2 landmarks of the hands.
At the last, we will print out all the landmarks point that is detected and filtered out according to the requirements.

From the above processing, we have encountered that all the landmarks which were detected are normalized to the common scales but now for the user end those scaled points are not relevant for that reason we will bring back those landmarks to the original state.

image_height, image_width, _ = sample_img.shape

if results.multi_hand_landmarks:

    for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
        
        print(f'HAND NUMBER: {hand_no+1}')
        print('-----------------------')
        
        for i in range(2):    
            print(f'{mp_hands.HandLandmark(i).name}:') 
            print(f'x: {hand_landmarks.landmark[mp_hands.HandLandmark(i).value].x * image_width}')
            print(f'y: {hand_landmarks.landmark[mp_hands.HandLandmark(i).value].y * image_height}')
            print(f'z: {hand_landmarks.landmark[mp_hands.HandLandmark(i).value].z * image_width}n')

Output:

Code-breakdown:

There is only one extra step that we need to perform here i.e. we will attain the original width and height of the image from the sample image that we defined and then all the steps will be the same as we have done previously only difference in the output will be that now the landmarks points are not scaled specifically.

Draw the landmarks on the image

As we have got the hands landmarks from the above preprocessing now it’s time to execute our final step which is to draw the points on the image so that we can visually see how our hand landmarks detection model is performing.

img_copy = sample_img.copy()

if results.multi_hand_landmarks:

    for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
        
        mp_drawing.draw_landmarks(image = img_copy, landmark_list = hand_landmarks,
                                  connections = mp_hands.HAND_CONNECTIONS)
    fig = plt.figure(figsize = [10, 10])

    plt.title("Resultant Image");plt.axis('off');plt.imshow(img_copy[:,:,::-1]);plt.show()

Output:

Code-breakdown:

First, we will create a copy of the original image, this step is done for safety purposes as we don’t want to lose the originality of the image.
Then we will take care of the validation thing which we did previously as well.
Then we will loop through each landmark of hands.
Finally, with the help of mp_drawing.draw_landmarks function, we will draw the landmarks on the image.
It’s time to plot the image using matplotlib so first, we will give the figure size (here, width-10 and height-10), then at last plot, the image using imshow function after converting the BGR format into RGB format because for the end-user RGB format makes more sense.

Conclusion

In the complete pipeline, we first have initialise the model then we read the image to have a look at our input image later for the preprocessing we have scaled down the landmarks point yet those points are not relevant for the user to end so for that we have reverted it to the original state at the last we will draw the landmarks points on the image.

Endnotes

Here’s the repo link to this article. Hope you liked my article on Hands landmarks detection using Mediapipe. If you have any opinions or questions, then comment below.

Read on AV Blog about various predictions using Machine Learning.

About Me

Greeting to everyone, I’m currently working in TCS and previously, I worked as a Data Science Analyst in Zorba Consulting India. Along with full-time work, I’ve got an immense interest in the same field, i.e. Data Science, along with its other subsets of Artificial Intelligence such as Computer Vision, Machine Learning, and Deep learning; feel free to collaborate with me on any project on the domains mentioned above (LinkedIn).

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.