Deep Learning to Create your Emoji

Shikha Gupta 25 Nov, 2021 • 6 min read

This article was published as a part of the Data Science Blogathon.

Overview for Deep Learning for Emojis

Nowadays, we are using several emojis or avatars to show our moods or feelings. They act as nonverbal cues of humans. They become the crucial part of emotion recognition, online chatting, brand emotion, product review, and a lot more. Data science research towards emoji-driven storytelling is increasing repeatedly.

The detection of human emotions from images is very trendy, and possibly due to the technical advancements in computer vision and deep learning. In this deep learning project, to filter and locate the respective emojis or avatars, we will classify the human facial expressions. If you are not familiar with deep learning you can click here.

About the Dataset

The pictorial dataset we are going to use for this project is FER2013 (Facial Expression Recognition 2013). It contains 48*48-pixel grayscale face images. The images are located in the center and occupy the same amount of space. Below is the facial expression categories present in our dataset:

  • 0:angry

  • 1:disgust

  • 2:feat

  • 3:happy

  • 4:sad

  • 5:surprise

  • 6:natural

Dataset: Facial Expression Recognition Dataset

 

Deep Learning to Create your emoji dataset

Approach:- Firstly, we build a deep learning model which classifies the facial expressions from the pictures. Then we will locate the already classified emotion with an avatar or an emoji.

CNN to Recognize Facial Emotion

Now we will build a convolution neural network(CNN) architecture and feed the FER2013 dataset to the model so that it can recognize emotion from images. We build the CNN model using the Keras layers in various steps. You can see each layer in the below diagram.

Deep Learning to Create your emoji | cnn to recognize emotions

To build the network we use two dense layers, one flatten layer and four conv2D layers. We are going to use the Softmax equation to generate the model output.

Prerequisites:- Just download the FER2013 dataset from the provided link. Extract the downloaded dataset in a folder named data with individual train and test directories.

Write below python code on your Jupiter notebook and save it with train.py:

Import the required libraries

import NumPy as np
import cv2
from Keras.emotion_models import Sequential
from kerasKeras.layers import Dense
from Keras.layers import Dropout
from Keras.layers import Flatten
from Keras. layers import Conv2D
from Keras.optimizers import Adam
from Keras. layers import MaxPooling2D
from Keras.preprocessing.image import ImageDataGenerator

Initialize the training and validation generators:

train_dir = 'data/train'
val_dir = 'data/test'
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
#training generator for CNN
train_generator = train_datagen.flow_from_directory(
       train_dir,
       target_size=(48,48),
       batch_size=64,
       color_mode="gray_framescale",
       class_mode='categorical')
#validation generator for CNN
validation_generator = val_datagen.flow_from_directory(
       val_dir,
       target_size=(48,48),
       batch_size=64,
       color_mode="gray_framescale",
       class_mode='categorical')

Output

Found 28709 images belonging to 7 classes.

Found 7178 images belonging to 7 classes.

To display the train data

for i in os.listdir("train/"):
    print(str(len(os.listdir("train/"+i))) +" "+ i +" images")

Output

3995 angry images

436 disgust images

4097 fear images

7215 happy images

4965 neutral images

4830 sad images

3171 surprise images

To display the test data

for i in os.listdir("test/"):
    print(str(len(os.listdir("test/"+i))) +" "+ i +" images")

Output

958 angry images

111 disgust images

1024 fear images

1774 happy images

1233 neutral images

1247 sad images

831 surprise images

Build the convolution network architecture:

emotion_model = Sequential()
emotion_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))#output=(48-3+0)/1+1=46
emotion_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))#output=(46-3+0)/1+1=44
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))#output=devided input by 2 it means 22,22,64
emotion_model.add(Dropout(0.25))#reduce 25% module at a time of output
emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu',input_shape=(48,48,1)))#(22-3+0)/1+1=20
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))#10
emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))#(10-3+0)/1+1=8
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))#output=4
emotion_model.add(Dropout(0.25))#nothing change
emotion_model.add(Flatten())#here we get multidimension output and pass as linear to the dense so that 4*4*128=2048
emotion_model.add(Dense(1024, activation='relu'))#hddien of 1024 neurons of input 
emotion_model.add(Dropout(0.5))
emotion_model.add(Dense(7, activation='softmax'))#hddien of 7 neurons of input
plot_model(emotion_model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)#save model leyer as model_plot.png
emotion_model.summary()

Output

 

Deep Learning to Create your emoji | Building CNN

Compile and train the model

emotion_model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=0.0001, decay=1e-6),metrics=['accuracy'])
emotion_model_info = emotion_model.fit_generator( #to fetch the model info from validation generator
       train_generator,
       steps_per_epoch=28709 // 64,
       epochs=50,
       validation_data=validation_generator,
       validation_steps=7178 // 64)

Output

 

epchs

Save the model weights:

emotion_model.save_weights('model.h5')#to save the model

To detect bounding boxes of face in the webcam and to predict the emotions we use OpenCV Haarcascade xml:

cv2.ocl.setUseOpenCL(False)
#emotion dictionary creation
em_dict = {0: "Angry", 1: "Disgusted", 2: "Fearful", 3: "Happy", 4: "Neutral", 5: "Sad", 6: "Surprised"}
cap = cv2.VideoCapture(0)
while True:
   ret, fram = cap.read()
if not ret:
       break
#bounding box initialization  
 bounding_box = cv2.CascadeClassifier('/home/shikha/.local/lib/python3.6/site-packages/cv2/data/haarcascade_frontalface_default.xml')
   gray_frame = cv2.cvtColor(fram, cv2.COLOR_BGR2gray_frame)
#to detect the multiple faces and frame them separately   
n_faces = bounding_box.detectMultiScale(gray_frame,scaleFactor=1.3, minNeighbors=5)
for (x, y, w, h) in n_faces:
       cv2.rectangle(fram, (x, y-50), (x+w, y+h+10), (255, 0, 0), 2)
       roi_frame = gray_frame[y:y + h, x:x + w]
       crop_img = np.expand_dims(np.expand_dims(cv2.resize(roi_frame, (48, 48)), -1), 0)
       emotion_prediction = emotion_model.predict(crop_img)
       maxindex = int(np.argmax(emotion_prediction))
       cv2.putText(frame, em_dict[maxindex], (x+20, y-60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
   cv2.imshow('Video', cv2.resize(frame,(1200,860),interpolation = cv2.INTER_CUBIC))
if cv2.waitKey(1) & 0xFF == ord('q'):
cap.release()
cv2.destroyAllWindows()
   break

Code for GUI and mapping with emojis

Firstly, create a folder with emojis name and then save the images of different facial expression(cartonify images) with respect to the seven emotions which are present in dataset.

Create a Jupiter notebook with the name gui.py and run the file.

  1. Import the Libraries

import Tkinter as tk
from tkinter import *
import cv2
from PIL import Image
from PIL import ImageTk
import os
import numpy as np
import cv2
from Keras.models import Sequential
from Keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers import Conv2D
from keras.optimizers import Adam
from keras.layers import MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator
  1. Model Creation- It involves the addition of different Keras layers to create a deep learning model shown in the below image.Code for GUI and mapping with emojis

emotion_model = Sequential()#to extract the features in model
emotion_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))
emotion_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))
emotion_model.add(Dropout(0.25))
emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))
emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
emotion_model.add(MaxPooling2D(pool_size=(2, 2)))
emotion_model.add(Dropout(0.25))
emotion_model.add(Flatten())
emotion_model.add(Dense(1024, activation='relu'))
emotion_model.add(Dropout(0.5))
emotion_model.add(Dense(7, activation='softmax'))
emotion_model.load_weights('model.h5')
cv2.ocl.setUseOpenCL(False)
  1. Mapping of facial emotion with Avtar

#emotion dictionary contains the emotions present in the dataset
em_dict = {0: "   Angry   ", 1: "Disgusted", 2: "  Fearful  ", 3: "   Happy   ", 4: "  Neutral  ", 5: "    Sad    ", 6: "Surprised"}
emoji_dist={0:"./emojis/angry.png",2:"./emojis/disgusted.png",2:"./emojis/fearful.png",3:"./emojis/happy.png",4:"./emojis/neutral.png",5:"./emojis/sad.png",6:"./emojis/surpriced.png"}
global last_frame1    #emoji dictionary is created with images for every emotion present ion dataset                               
last_frame1 = np.zeros((480, 640, 3), dtype=np.uint8)
global cap1
show_text=[0]
def show_vid():    #to open the camera and to record video
   cap1 = cv2.VideoCapture(0)      #it starts capturing                          
if not cap1.isOpened():  #if camera is not open                          
       print("cant open the camera1")
   flag1, frame1 = cap1.read()
   frame1 = cv2.resize(frame1,(600,500))#to resize the image frame
   bound_box = cv2.CascadeClassifier('/home/shikha/.local/lib/python3.6/site-packages/cv2/data/haarcascade_frontalface_default.xml')#it will detect the face in the video and bound it with a rectangular box
   gray_frame = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)#to color the frame
   n_faces = bound_box.detectMultiScale(gray_frame,scaleFactor=1.3, minNeighbors=5)
for (x, y, w, h) in n_faces: #for n different faces of a video
       cv2.rectangle(frame1, (x, y-50), (x+w, y+h+10), (255, 0, 0), 2)
       roi_frame = gray_frame[y:y + h, x:x + w]
       crop_img = np.expand_dims(np.expand_dims(cv2.resize(roi_frame, (48, 48)), -1), 0)#crop the image and save only emotion contating face
       prediction = emotion_model.predict(crop_img)#predict the emotion from the cropped image
       maxindex = int(np.argmax(prediction))
       cv2.putText(frame1, em_dict[maxindex], (x+20, y-60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA)
       show_text[0]=maxindex#store the emotion found in image from emotion dictionary
if flag1 is None:#if webcam is disabled
       print ("Major error!")
   elif flag1:
       global last_frame1
       last_frame1 = frame1.copy()
       pic = cv2.cvtColor(last_frame1, cv2.COLOR_BGR2RGB) #to store the image   
       img = Image.fromarray(pic)
       imgtk = ImageTk.PhotoImage(image=img)
       lmain.imgtk = imgtk
       lmain.configure(image=imgtk)
       lmain.after(10, show_vid)
if cv2.waitKey(1) & 0xFF == ord('q'):
       exit()
def show_vid2():
   frame2=cv2.imread(emoji_dist[show_text[0]])#to store the emoji with respect to the emotion
   pic2=cv2.cvtColor(frame2,cv2.COLOR_BGR2RGB)
   img2=Image.fromarray(frame2)
   imgtk2=ImageTk.PhotoImage(image=img2)
   lmain2.imgtk2=imgtk2
   lmain3.configure(text=emotion_dict[show_text[0]],font=('arial',45,'bold'))#to configure image and text
   lmain2.configure(image=imgtk2)
   lmain2.after(10, show_vid2)
if __name__ == '__main__':
   root=tk.Tk()  
   img = ImageTk.PhotoImage(Image.open("logo.png"))
   heading = Label(root,image=img,bg='black')
   heading.pack()
   heading2=Label(root,text="Photo to Emoji",pady=20, font=('arial',45,'bold'),bg='black',fg='#CDCDCD')#to label the output                                
   heading2.pack()
   lmain = tk.Label(master=root,padx=50,bd=10)
   lmain2 = tk.Label(master=root,bd=10)
   lmain3=tk.Label(master=root,bd=10,fg="#CDCDCD",bg='black')
   lmain.pack(side=LEFT)
   lmain.place(x=50,y=250)
   lmain3.pack()
   lmain3.place(x=960,y=250)
   lmain2.pack(side=RIGHT)
   lmain2.place(x=900,y=350)
   root.title("Photo To Emoji")           
   root.geometry("1400x900+100+10")
   root['bg']='black'
   exitbutton = Button(root, text='Quit',fg="red",command=root.destroy,font=('arial',25,'bold')).pack(side = BOTTOM)
   show_vid()#function calling to record video
   show_vid2()#function calling to generate emoji from recorded video
   root.mainloop()

Output

Output

Summary

This project is based on the Keras library of deep learning technology. In order to recognize facial emotions, we have built a convolution neural network. After that, we fed our model with the FER2013 dataset. And finally, we map each facial emotion with its corresponding emojis or avatars.

To detect the bounding box of images in the webcam we use the OpenCV’s Haarcascade XML. In the end, we serve these boxes to the trained model for the purpose of classification.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion

Shikha Gupta 25 Nov 2021

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

Abhilash Kollikonda
Abhilash Kollikonda 31 Dec, 2021

Hi, In the above project why are we creating model again in the tkinter GUI program, we already have the model saved right? can't we directly use it?

Related Courses

Computer Vision
Become a full stack data scientist