Gunjan Agarwal — November 3, 2021
Advanced Computer Vision Image Image Analysis Python

This article was published as a part of the Data Science Blogathon.

An End to End guide for model training and deployment for facial emotion detection using the webcam.

In our previous article, we have explored emotion detection in the text, which is quite helpful for several use cases, you can read the article here. While emotion detection using tet is the quite useful industry is now focusing on one more area which is Facial Emotion Detection. Emotion Detection using images is quite useful for identification like driver’s drowsiness detection, students behavior detection, etc.

In this article, we will cover this interesting application of computer vision. As we all know nowadays computer vision is getting advanced. major tech giants are building their models to become more like humans, to do so machines must be capable of detecting your emotions and treating you accordingly.

This article demonstrates to you how to build a model using Tensorflow, which can tell you emotion using your picture or live webcam feed.

Checkpoints that we will de discussing in this article are:

  • Getting Data
  • Preparing data
  • Image Augmentation
  • Build model and train
  • use the webcam for detection

Getting Started with Facial Emotion Detection

So let’s dive straight into the implementation part of Facial Emotion Detection.

Getting Data

We will be using the dataset fer-2013 which is publically available on Kaggle. it has 48*48 pixels gray-scale images of faces along with their emotion labels.

This dataset contains 7 Emotions :- (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)

Start by importing pandas and some essential libraries and then loading the dataset.

import matplotlib.pyplot as plt
import numpy as np
import scipy
import pandas as pddf = pd.read_csv('../input/facial-expression-recognitionferchallenge/fer2013/fer2013/fer2013.csv')
df.head()
Getting data Facial Emotion Detection
Source: Local

This dataset contains 3 columns, emotionpixels and Usage. Emotion column contains integer encoded emotions and pixels column
contains pixels in the form of a string seperated by spaces, and usage
tells if data is made for training or testing purpose.

Preparing Data

You see data is not in the right format. we need to pre-process the data. Here X_train, X_test contains pixels, and y_testy_train contains emotions.

X_train = []
y_train = []
X_test = []
y_test = []
for index, row in df.iterrows():
    k = row['pixels'].split(" ")
    if row['Usage'] == 'Training':
        X_train.append(np.array(k))
        y_train.append(row['emotion'])
    elif row['Usage'] == 'PublicTest':
        X_test.append(np.array(k))
        y_test.append(row['emotion'])
preparing data Facial Emotion Detection
Source: Local

At this stage X_train, X_test contains pixel’s number is in the form of a string, converting it into numbers is easy, we just need to typecast.

X_train = np.array(X_train, dtype = 'uint8')
y_train = np.array(y_train, dtype = 'uint8')
X_test = np.array(X_test, dtype = 'uint8')
y_test = np.array(y_test, dtype = 'uint8')

y_test, y_train contains 1D integer encoded labels, we need to connect them into categorical data for efficient training.

import keras
from keras.utils import to_categorical
y_train= to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)

num_classes = 7 shows that we have 7 classes to classify.

Reshaping Data

You need to convert the data in the form of a 4d tensor (row_num, width, height, channel) for training purposes.

X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)
X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)

Here 1 tells us that training data is in grayscale form, at this stage, we have successfully preprocessed our data into X_train, X_test, y_trainy_test.

Image Augmentation for Facial Emotion Detection

Image data augmentation is used to improve the performance and ability of the model to generalize. It’s always a good practice to apply some
data augmentation before passing it to the model, which can be done using ImageDataGenetrator provided by Keras. 

from keras.preprocessing.image import ImageDataGenerator 
datagen = ImageDataGenerator( 
    rescale=1./255,
    rotation_range = 10,
    horizontal_flip = True,
    width_shift_range=0.1,
    height_shift_range=0.1,
    fill_mode = 'nearest')
testgen = ImageDataGenerator(rescale=1./255)
datagen.fit(X_train)
batch_size = 64
  • rescale: It normalizes the pixel value by dividing it by 255.
  • horizontal_flip: It flips the image horizontally.
  • fill_mode: It fills the image if not available after some cropping.
  • rotation_range: It rotates the image by 0–90 degrees.

On testing data, we will only apply rescaling(normalization).

Fitting the generator to our data

We will use batch_size of 64 and after fitting our data to our image generator, data will be generated in the batch size of 64. Using a data generator is the best way to train a large amount of data.

train_flow = datagen.flow(X_train, y_train, batch_size=batch_size) 
test_flow = testgen.flow(X_test, y_test, batch_size=batch_size)

train_flow contains our X_train and y_train while test_flow contains our X_test and y_test.

Building Facial Emotion Detection Model using CNN

Designing the CNN model for emotion detection using functional API. We are creating blocks using Conv2D layer, Batch-Normalization, Max-Pooling2D, Dropout, Flatten, and then stacking them together and at the end-use Dense Layer for output, you can read more on how to design CNN models. 

Building the model using functional API gives more flexibility.

from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers.pooling import MaxPooling2D
from keras.layers.merge import concatenate
from keras.optimizers import Adam, SGD
from keras.regularizers import l1, l2
from matplotlib import pyplot as plt
from sklearn.metrics import confusion_matrix

FER_model takes input size and returns model for training. Now let’s define the architecture of the model.

def FER_Model(input_shape=(48,48,1)):
    # first input model
    visible = Input(shape=input_shape, name='input')
    num_classes = 7
    #the 1-st block
    conv1_1 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_1')(visible)
    conv1_1 = BatchNormalization()(conv1_1)
    conv1_2 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_2')(conv1_1)
    conv1_2 = BatchNormalization()(conv1_2)
    pool1_1 = MaxPooling2D(pool_size=(2,2), name = 'pool1_1')(conv1_2)
    drop1_1 = Dropout(0.3, name = 'drop1_1')(pool1_1)#the 2-nd block
    conv2_1 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_1')(drop1_1)
    conv2_1 = BatchNormalization()(conv2_1)
    conv2_2 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_2')(conv2_1)
    conv2_2 = BatchNormalization()(conv2_2)
    conv2_3 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_3')(conv2_2)
    conv2_2 = BatchNormalization()(conv2_3)
    pool2_1 = MaxPooling2D(pool_size=(2,2), name = 'pool2_1')(conv2_3)
    drop2_1 = Dropout(0.3, name = 'drop2_1')(pool2_1)#the 3-rd block
    conv3_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_1')(drop2_1)
    conv3_1 = BatchNormalization()(conv3_1)
    conv3_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_2')(conv3_1)
    conv3_2 = BatchNormalization()(conv3_2)
    conv3_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_3')(conv3_2)
    conv3_3 = BatchNormalization()(conv3_3)
    conv3_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_4')(conv3_3)
    conv3_4 = BatchNormalization()(conv3_4)
    pool3_1 = MaxPooling2D(pool_size=(2,2), name = 'pool3_1')(conv3_4)
    drop3_1 = Dropout(0.3, name = 'drop3_1')(pool3_1)#the 4-th block
    conv4_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_1')(drop3_1)
    conv4_1 = BatchNormalization()(conv4_1)
    conv4_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_2')(conv4_1)
    conv4_2 = BatchNormalization()(conv4_2)
    conv4_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_3')(conv4_2)
    conv4_3 = BatchNormalization()(conv4_3)
    conv4_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_4')(conv4_3)
    conv4_4 = BatchNormalization()(conv4_4)
    pool4_1 = MaxPooling2D(pool_size=(2,2), name = 'pool4_1')(conv4_4)
    drop4_1 = Dropout(0.3, name = 'drop4_1')(pool4_1)
    
    #the 5-th block
    conv5_1 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_1')(drop4_1)
    conv5_1 = BatchNormalization()(conv5_1)
    conv5_2 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_2')(conv5_1)
    conv5_2 = BatchNormalization()(conv5_2)
    conv5_3 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_3')(conv5_2)
    conv5_3 = BatchNormalization()(conv5_3)
    conv5_4 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_4')(conv5_3)
    conv5_3 = BatchNormalization()(conv5_3)
    pool5_1 = MaxPooling2D(pool_size=(2,2), name = 'pool5_1')(conv5_4)
    drop5_1 = Dropout(0.3, name = 'drop5_1')(pool5_1)#Flatten and output
    flatten = Flatten(name = 'flatten')(drop5_1)
    ouput = Dense(num_classes, activation='softmax', name = 'output')(flatten)# create model 
    model = Model(inputs =visible, outputs = ouput)
    # summary layers
    print(model.summary())
    
    return model

Compiling the Facial Emotion Detection Model

Compiling model using Adam optimizer keeping lr= 0.001, if the model’s accuracy doesn’t improve after some epochs learning rate decreases by decay factor.

model = FER_Model()
opt = Adam(lr=0.0001, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

Training the Facial Emotion Detection Model

To train the model you need to write the following line of code.

num_epochs = 100  
history = model.fit_generator(train_flow, 
                    steps_per_epoch=len(X_train) / batch_size, 
                    epochs=num_epochs,  
                    verbose=1,  
                    validation_data=test_flow,

                    validation_steps=len(X_test) / batch_size)

  • steps_per_epoch = TotalTrainingSamples / TrainingBatchSize
  • validation_steps = TotalvalidationSamples / ValidationBatchSize

Training takes at least 20 minutes for 100 epochs.

Source: Local

Save the Model

Saving our model’s architecture into JSON and model’s weight into .h5.

model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")

Download the saved model and weights in a directory.

Testing the Model using Webcam Feed

In this part, we will test our model in real-time using face detection.

Loading the saved model

Let’s start by loading the trained model architecture and weights so that it can be used further to make predictions.

from tensorflow.keras.models import model_from_json
model = model_from_json(open("model_arch.json", "r").read())
model.load_weights('model.h5')

Loading Har-Cascade for Face Detection

We are using Haar-cascade for the detection position of faces and after getting position we will crop the faces.

haarcascade_frontalface_default can be downloaded using the link.

import cv2
face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

Read Frames and apply Preprocessing using OpenCV

Use OpenCV to read frames and for image processing.

cap=cv2.VideoCapture(0)while cap.isOpened():
    res,frame=cap.read()height, width , channel = frame.shapegray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_haar_cascade.detectMultiScale(gray_image )
    try:
        for (x,y, w, h) in faces:
            cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness =  2)
            roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
            roi_gray=cv2.resize(roi_gray,(48,48))
            image_pixels = img_to_array(roi_gray)
            image_pixels = np.expand_dims(image_pixels, axis = 0)
            image_pixels /= 255
            predictions = model.predict(image_pixels)
            max_index = np.argmax(predictions[0])
            emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
            emotion_prediction = emotion_detection[max_index]
  • here emotion_prediction returns the label of emotion.
  • normalize test images by dividing them by 255.
  • np.expand_dims convert a 3D matrix into a 4D tensor.
  • (x,y,w,h) are the coordinates of faces in the input frame.
  • haar_cascade takes only grayscale images.

Adding Overlay

Adding an overlay on the output frame and displaying the prediction with confidence gives a better look.

cap=cv2.VideoCapture(1)while cap.isOpened():
    res,frame=cap.read()height, width , channel = frame.shape#---------------------------------------------------------------------------
    # Creating an Overlay window to write prediction and cofidencesub_img = frame[0:int(height/6),0:int(width)]black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0
    res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0)
    FONT = cv2.FONT_HERSHEY_SIMPLEX
    FONT_SCALE = 0.8
    FONT_THICKNESS = 2
    lable_color = (10, 10, 255)
    lable = "Emotion Detection made by Abhishek"
    lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0]
    textX = int((res.shape[1] - lable_dimension[0]) / 2)
    textY = int((res.shape[0] + lable_dimension[1]) / 2)
    cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0), FONT_THICKNESS)# prediction part --------------------------------------------------------------------------gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = face_haar_cascade.detectMultiScale(gray_image )
    try:
        for (x,y, w, h) in faces:
            cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness =  2)
            roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5]
            roi_gray=cv2.resize(roi_gray,(48,48))
            image_pixels = img_to_array(roi_gray)
            image_pixels = np.expand_dims(image_pixels, axis = 0)
            image_pixels /= 255
            predictions = model.predict(image_pixels)
            max_index = np.argmax(predictions[0])
            emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral')
            emotion_prediction = emotion_detection[max_index]
            cv2.putText(res, "Sentiment: {}".format(emotion_prediction), (0,textY+22+5), FONT,0.7, lable_color,2)
            lable_violation = 'Confidence: {}'.format(str(np.round(np.max(predictions[0])*100,1))+ "%")
            violation_text_dimension = cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0]
            violation_x_axis = int(res.shape[1]- violation_text_dimension[0])
            cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7, lable_color,2)
    except :
        pass
    frame[0:int(height/6),0:int(width)] = res
    cv2.imshow('frame', frame)if cv2.waitKey(1) & 0xFF == ord('q'):
        breakcap.release()
cv2.destroyAllWindows

Now run it !!!

Facial Emotion Detection video
Source: Local

Conclusion

In this article, you have seen how to preprocess data, design a network that is capable of classifying the emotions, and then use Opencv
for the detection of the faces and then pass it for prediction.

you can improve accuracy further by :

  • using pre-trained models like VGG-16 or Resnet etc.
  • Using Stacked Model
  • performing some fine-tuning

download source codes from here.

Thanks for reading the article, please share if you liked this article!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Leave a Reply Your email address will not be published. Required fields are marked *