This article was published as a part of the Data Science Blogathon.
An End to End guide for model training and deployment for facial emotion detection using the webcam.
In our previous article, we have explored emotion detection in the text, which is quite helpful for several use cases, you can read the article here. While emotion detection using tet is the quite useful industry is now focusing on one more area which is Facial Emotion Detection. Emotion Detection using images is quite useful for identification like driver’s drowsiness detection, students behavior detection, etc.
In this article, we will cover this interesting application of computer vision. As we all know nowadays computer vision is getting advanced. major tech giants are building their models to become more like humans, to do so machines must be capable of detecting your emotions and treating you accordingly.
This article demonstrates to you how to build a model using Tensorflow, which can tell you emotion using your picture or live webcam feed.
Checkpoints that we will de discussing in this article are:
So let’s dive straight into the implementation part of Facial Emotion Detection.
We will be using the dataset fer-2013
which is publically available on Kaggle. it has 48*48 pixels gray-scale images of faces along with their emotion labels.
This dataset contains 7 Emotions :- (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral)
Start by importing pandas and some essential libraries and then loading the dataset.
Python Code:
This dataset contains 3 columns, emotion, pixels and Usage. Emotion column contains integer encoded emotions and pixels column
contains pixels in the form of a string seperated by spaces, and usage
tells if data is made for training or testing purpose.
You see data is not in the right format. we need to pre-process the data. Here X_train, X_test contains pixels, and y_test, y_train contains emotions.
X_train = [] y_train = [] X_test = [] y_test = [] for index, row in df.iterrows(): k = row['pixels'].split(" ") if row['Usage'] == 'Training': X_train.append(np.array(k)) y_train.append(row['emotion']) elif row['Usage'] == 'PublicTest': X_test.append(np.array(k)) y_test.append(row['emotion'])
At this stage X_train, X_test contains pixel’s number is in the form of a string, converting it into numbers is easy, we just need to typecast.
X_train = np.array(X_train, dtype = 'uint8') y_train = np.array(y_train, dtype = 'uint8') X_test = np.array(X_test, dtype = 'uint8') y_test = np.array(y_test, dtype = 'uint8')
y_test, y_train contains 1D integer encoded labels, we need to connect them into categorical data for efficient training.
import keras from keras.utils import to_categorical y_train= to_categorical(y_train, num_classes=7) y_test = to_categorical(y_test, num_classes=7)
num_classes = 7 shows that we have 7 classes to classify.
You need to convert the data in the form of a 4d tensor (row_num, width, height, channel) for training purposes.
X_train = X_train.reshape(X_train.shape[0], 48, 48, 1) X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)
Here 1 tells us that training data is in grayscale form, at this stage, we have successfully preprocessed our data into X_train, X_test, y_train, y_test.
Image data augmentation is used to improve the performance and ability of the model to generalize. It’s always a good practice to apply some
data augmentation before passing it to the model, which can be done using ImageDataGenetrator provided by Keras.
from keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator( rescale=1./255, rotation_range = 10, horizontal_flip = True, width_shift_range=0.1, height_shift_range=0.1, fill_mode = 'nearest')
testgen = ImageDataGenerator(rescale=1./255) datagen.fit(X_train)
batch_size = 64
On testing data, we will only apply rescaling(normalization).
We will use batch_size of 64 and after fitting our data to our image generator, data will be generated in the batch size of 64. Using a data generator is the best way to train a large amount of data.
train_flow = datagen.flow(X_train, y_train, batch_size=batch_size) test_flow = testgen.flow(X_test, y_test, batch_size=batch_size)
train_flow contains our X_train and y_train while test_flow contains our X_test and y_test.
Designing the CNN model for emotion detection using functional API. We are creating blocks using Conv2D layer, Batch-Normalization, Max-Pooling2D, Dropout, Flatten, and then stacking them together and at the end-use Dense Layer for output, you can read more on how to design CNN models.
Building the model using functional API gives more flexibility.
from keras.utils import plot_model from keras.models import Model from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization from keras.layers.convolutional import Conv2D from keras.layers.pooling import MaxPooling2D from keras.layers.merge import concatenate from keras.optimizers import Adam, SGD from keras.regularizers import l1, l2 from matplotlib import pyplot as plt from sklearn.metrics import confusion_matrix
FER_model takes input size and returns model for training. Now let’s define the architecture of the model.
def FER_Model(input_shape=(48,48,1)): # first input model visible = Input(shape=input_shape, name='input') num_classes = 7 #the 1-st block conv1_1 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_1')(visible) conv1_1 = BatchNormalization()(conv1_1) conv1_2 = Conv2D(64, kernel_size=3, activation='relu', padding='same', name = 'conv1_2')(conv1_1) conv1_2 = BatchNormalization()(conv1_2) pool1_1 = MaxPooling2D(pool_size=(2,2), name = 'pool1_1')(conv1_2) drop1_1 = Dropout(0.3, name = 'drop1_1')(pool1_1)#the 2-nd block conv2_1 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_1')(drop1_1) conv2_1 = BatchNormalization()(conv2_1) conv2_2 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_2')(conv2_1) conv2_2 = BatchNormalization()(conv2_2) conv2_3 = Conv2D(128, kernel_size=3, activation='relu', padding='same', name = 'conv2_3')(conv2_2) conv2_2 = BatchNormalization()(conv2_3) pool2_1 = MaxPooling2D(pool_size=(2,2), name = 'pool2_1')(conv2_3) drop2_1 = Dropout(0.3, name = 'drop2_1')(pool2_1)#the 3-rd block conv3_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_1')(drop2_1) conv3_1 = BatchNormalization()(conv3_1) conv3_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_2')(conv3_1) conv3_2 = BatchNormalization()(conv3_2) conv3_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_3')(conv3_2) conv3_3 = BatchNormalization()(conv3_3) conv3_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv3_4')(conv3_3) conv3_4 = BatchNormalization()(conv3_4) pool3_1 = MaxPooling2D(pool_size=(2,2), name = 'pool3_1')(conv3_4) drop3_1 = Dropout(0.3, name = 'drop3_1')(pool3_1)#the 4-th block conv4_1 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_1')(drop3_1) conv4_1 = BatchNormalization()(conv4_1) conv4_2 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_2')(conv4_1) conv4_2 = BatchNormalization()(conv4_2) conv4_3 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_3')(conv4_2) conv4_3 = BatchNormalization()(conv4_3) conv4_4 = Conv2D(256, kernel_size=3, activation='relu', padding='same', name = 'conv4_4')(conv4_3) conv4_4 = BatchNormalization()(conv4_4) pool4_1 = MaxPooling2D(pool_size=(2,2), name = 'pool4_1')(conv4_4) drop4_1 = Dropout(0.3, name = 'drop4_1')(pool4_1) #the 5-th block conv5_1 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_1')(drop4_1) conv5_1 = BatchNormalization()(conv5_1) conv5_2 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_2')(conv5_1) conv5_2 = BatchNormalization()(conv5_2) conv5_3 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_3')(conv5_2) conv5_3 = BatchNormalization()(conv5_3) conv5_4 = Conv2D(512, kernel_size=3, activation='relu', padding='same', name = 'conv5_4')(conv5_3) conv5_3 = BatchNormalization()(conv5_3) pool5_1 = MaxPooling2D(pool_size=(2,2), name = 'pool5_1')(conv5_4) drop5_1 = Dropout(0.3, name = 'drop5_1')(pool5_1)#Flatten and output flatten = Flatten(name = 'flatten')(drop5_1) ouput = Dense(num_classes, activation='softmax', name = 'output')(flatten)# create model model = Model(inputs =visible, outputs = ouput) # summary layers print(model.summary()) return model
Compiling model using Adam optimizer keeping lr= 0.001, if the model’s accuracy doesn’t improve after some epochs learning rate decreases by decay factor.
model = FER_Model() opt = Adam(lr=0.0001, decay=1e-6) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
To train the model you need to write the following line of code.
num_epochs = 100 history = model.fit_generator(train_flow, steps_per_epoch=len(X_train) / batch_size, epochs=num_epochs, verbose=1, validation_data=test_flow,
validation_steps=len(X_test) / batch_size)
=
TotalTrainingSamples / TrainingBatchSize
TotalvalidationSamples / ValidationBatchSizeTraining takes at least 20 minutes for 100 epochs.
Saving our model’s architecture into JSON and model’s weight into .h5.
model_json = model.to_json() with open("model.json", "w") as json_file: json_file.write(model_json) model.save_weights("model.h5") print("Saved model to disk")
Download the saved model and weights in a directory.
In this part, we will test our model in real-time using face detection.
Let’s start by loading the trained model architecture and weights so that it can be used further to make predictions.
from tensorflow.keras.models import model_from_json model = model_from_json(open("model_arch.json", "r").read()) model.load_weights('model.h5')
We are using Haar-cascade for the detection position of faces and after getting position we will crop the faces.
haarcascade_frontalface_default can be downloaded using the link.
import cv2 face_haar_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
Use OpenCV to read frames and for image processing.
cap=cv2.VideoCapture(0)while cap.isOpened(): res,frame=cap.read()height, width , channel = frame.shapegray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_haar_cascade.detectMultiScale(gray_image ) try: for (x,y, w, h) in faces: cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness = 2) roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5] roi_gray=cv2.resize(roi_gray,(48,48)) image_pixels = img_to_array(roi_gray) image_pixels = np.expand_dims(image_pixels, axis = 0) image_pixels /= 255 predictions = model.predict(image_pixels) max_index = np.argmax(predictions[0]) emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral') emotion_prediction = emotion_detection[max_index]
takes only grayscale images.Adding an overlay on the output frame and displaying the prediction with confidence gives a better look.
cap=cv2.VideoCapture(1)while cap.isOpened(): res,frame=cap.read()height, width , channel = frame.shape#--------------------------------------------------------------------------- # Creating an Overlay window to write prediction and cofidencesub_img = frame[0:int(height/6),0:int(width)]black_rect = np.ones(sub_img.shape, dtype=np.uint8)*0 res = cv2.addWeighted(sub_img, 0.77, black_rect,0.23, 0) FONT = cv2.FONT_HERSHEY_SIMPLEX FONT_SCALE = 0.8 FONT_THICKNESS = 2 lable_color = (10, 10, 255) lable = "Emotion Detection made by Abhishek" lable_dimension = cv2.getTextSize(lable,FONT ,FONT_SCALE,FONT_THICKNESS)[0] textX = int((res.shape[1] - lable_dimension[0]) / 2) textY = int((res.shape[0] + lable_dimension[1]) / 2) cv2.putText(res, lable, (textX,textY), FONT, FONT_SCALE, (0,0,0), FONT_THICKNESS)# prediction part --------------------------------------------------------------------------gray_image= cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) faces = face_haar_cascade.detectMultiScale(gray_image ) try: for (x,y, w, h) in faces: cv2.rectangle(frame, pt1 = (x,y),pt2 = (x+w, y+h), color = (255,0,0),thickness = 2) roi_gray = gray_image[y-5:y+h+5,x-5:x+w+5] roi_gray=cv2.resize(roi_gray,(48,48)) image_pixels = img_to_array(roi_gray) image_pixels = np.expand_dims(image_pixels, axis = 0) image_pixels /= 255 predictions = model.predict(image_pixels) max_index = np.argmax(predictions[0]) emotion_detection = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral') emotion_prediction = emotion_detection[max_index] cv2.putText(res, "Sentiment: {}".format(emotion_prediction), (0,textY+22+5), FONT,0.7, lable_color,2) lable_violation = 'Confidence: {}'.format(str(np.round(np.max(predictions[0])*100,1))+ "%") violation_text_dimension = cv2.getTextSize(lable_violation,FONT,FONT_SCALE,FONT_THICKNESS )[0] violation_x_axis = int(res.shape[1]- violation_text_dimension[0]) cv2.putText(res, lable_violation, (violation_x_axis,textY+22+5), FONT,0.7, lable_color,2) except : pass frame[0:int(height/6),0:int(width)] = res cv2.imshow('frame', frame)if cv2.waitKey(1) & 0xFF == ord('q'): breakcap.release() cv2.destroyAllWindows
Now run it !!!
In this article, you have seen how to preprocess data, design a network that is capable of classifying the emotions, and then use Opencv
for the detection of the faces and then pass it for prediction.
you can improve accuracy further by :
download source codes from here.
Thanks for reading the article, please share if you liked this article!
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
I am interested in using this model for a non-profit for detecting Dementia patients emotions during Reminiscence Therapy. What are the next steps to proceed on this? Please advise.