A Computer Vision Approach to Hand Gesture Recognition
Soldiers communicate with each other through gestures. But sometimes those gestures are not visible due to obstructions or poor lighting. For that purpose, an instrument is required to record the gesture and send it to the fellow soldiers. The two options for gesture recognition are through Computer Vision and through some sensors attached to the hands.
The first option is not viable in this case as proper lighting is required for recognition through Computer Vision. Hence the second option of using sensors for recognitions has been used. We present a system which recognizes the gestures given in this link.
Table of Contents
- Constructing the System
- Algorithm for Static Gesture Recognition
- Algorithm for Dynamic Gesture Classification
Constructing the System
The given gestures include motions of fingers, wrist, and elbow. To detect any changes in them we have used flex sensors which detect the amount by which it has been bent at each of these joints. To take into account for the dynamic gestures an Inertial Measurement Unit (IMU-MPU-9250) was used. The parameters used from the IMU are acceleration, gyroscopic acceleration, and angles in all three axes. An Arduino* Mega was used to receive the signals from the sensors and send it to the processor.
A flex sensor is a strip which has a resistance proportional to the amount of strain in the sensor. Thus it gives out a variable voltage value according to the strain. An IMU (MPU-6050) gives out linear acceleration and gyroscopic acceleration in all three axes (x, y, z).
The gestures can be classified into two sub-classes:
- Static Gestures
- Dynamic Gestures
The number of features primarily used for both the sub classes differ
- For static gestures we have used the flex sensors values and the angles with all three axes as the features.
- For dynamic gestures we used the flex sensors values, linear acceleration, gyroscopic acceleration, and the angles in all three axes.
Algorithm for Static Gesture Recognition
First of all the angles have to be calculated from the acceleration values using these formulae.
The angle values have some noise in them and thus have to be filtered out in order to get smooth values out of it. Thus we have used a Kalman filter for filtering the values. Then both the flex sensor values and angles are fed into a pre-trained Support Vector Machine (SVM) with Radial Basis Function (Gaussian) Kernel. And thus the output is obtained.
Figure 1: Principal Component Analysis of the dataset using all the features. Each of the colored cluster represents a particular gesture. As accelerations are also included the clusters are quite elongated.
Figure 2: Principal Component Analysis of the dataset using just flex sensor values and angles. Here each colored cluster represents a particular gesture. Also these clusters are classifiable.
Algorithm for Dynamic Gesture Classification
The angles, liner accelerations, and gyroscopic accelerations are filtered using a Kalman Filter. The values are stores in a temporary file with each line representing one time point. Then every value is normalized column-wise. Then 50 time points are sampled out of them. After that they are linearized into one single vector of 800 dimensions.
Then it is fed into a SVM with Radial Basis Function kernel (Gaussian). Because some gestures like ‘Column Formation’, ‘Vehicle’, ‘Ammunition’, and ‘Rally-Point’ are similar to each other we have grouped such similar features as one class. If the first SVM classifies into one of these groups then they are fed into another SVM which is trained just to classify the gestures in that group.
Figure 3: Two samples of graph of x-axis acceleration the gesture door.
Salient Features of the system:
- No hindrance in the motion of the hands.
- The system is lightweight.
- The system can recognize 27/28 static gestures and 14/15 dynamic gestures.
- The system can be improved by using a Neural Network by gathering more data. Hence a mechanism to record new data and store them immediately has been made. Thus making room for more number of gestures to be recognized.
- The size can be reduced a lot by using a custom made processor for signals.
*Since we were told to show the output on a screen we have not used a Raspberry Pi Zero (microprocessor) for processing purposes. But it can be used for that and we have checked the feasibility of the algorithm’s speed in that processor also.
** We generated our own data for training and testing.
***For detailed documentation and code visit my GitHub.