Building a Face Detection Model from Video using Deep Learning (Python Implementation)

JalFaizy Shaikh 06 May, 2019
8 min read


“Computer vision and machine learning have really started to take off, but for most people, the whole idea of what a computer is seeing when it’s looking at an image is relatively obscure.” – Mike Kreiger

The wonderful field of Computer Vision has soared into a league of it’s own in recent years. There are an impressive number of applications already in wide use around the world – and we are just getting started!

One of my favorite things in this field is the idea of our community embracing the concept of open source. Even the big tech giants are willing to share new breakthroughs and innovations with everyone so that the techniques do not remain a “thing of the rich”.

One such technology is face detection, which offers a plethora of potential applications in real-world use cases (if used correctly and ethically). In this article, I will show you how to build a capable face detection algorithm using open source tools. Here is a demo to get you excited and set the stage for what will follow:

So, are you ready? Read on then!

Note: If you want to understand the intricacies of computer vision, this course – Computer Vision using Deep Learning – is the perfect place to start.


Table of Contents

  • Potential Applications of Face Detection
  • Setting up the System – Hardware/Software Requirements
    • Hardware Setup
    • Software Setup
  • Diving into the Python Implementation
    • Simple Walkthrough
    • Face Detection Use Case


Promising Applications of Face Detection

Let me pull up some awesome examples of applications where face detection techniques are being popularly used. I’m sure you must have come across these use cases at some point and not realized what technique was being used behind the scenes!

For instance, Facebook replaced manual image tagging with automatically generated tag suggestions for each picture that was uploaded to the platform. Facebook uses a simple face detection algorithm to analyze the pixels of faces in the image and compare it with relevant users. We’ll learn how to build a face detection model ourselves, but before we get into the technical details of that, let’s discuss some other use cases.

We are becoming used to unlocking our phones with the latest ‘face unlock’ feature. This is a very small example of how a face detection technique is being used to maintain the security of personal data.  The same can be implemented on a larger scale, enabling cameras to capture images and detect faces.

There are a few other lesser known applications of face detection in advertising, healthcare, banking, etc. Most of the companies, or even in many conferences, you are supposed to carry an ID card in order to get entry. But what if we could figure out a way so that you don’t need to carry any ID card to get access? Face Detection helps in making this process smooth and easy. The person just looks at the camera and it will automatically detect whether he/she should be allowed to enter or not.

Another interesting application of face detection could be to count the number of people attending an event (like a conference or concert). Instead of manually counting the attendees, we install a camera which can capture the images of the attendees and give us the total head count. This can help to automate the process and save a ton of manual effort. Pretty useful, isn’t it?

You can come up with many more applications like these – feel free to share them in the comments section below.

In this article, I will focus upon the practical application of face detection, and just gloss over upon how the algorithms in it actually work. If you want to know more about them, you go through this article.


Setting up the System – Hardware/Software Requirements

Now that you know the potential applications you can build with face detection techniques, let’s see how we can implement this using the open source tools available to us. That’s the advantage we have with our community – the willingness to share and open source code is unparalleled across any industry.

For this article specifically, here’s what I have used and recommend using:

  • A webcam (Logitech C920) to build a real time face detector on a Lenovo E470 ThinkPad Laptop (Core i5 7th Gen). You can also use your laptop’s in-built camera, or CCTV camera, on any appropriate system for real time video analysis, instead of the setup I am using
  • Using a GPU for faster video processing is always a bonus
  • On the software side, we have used Ubuntu 18.04 OS with all the prerequisite software installed

Let’s explore these points in a bit more detail to ensure everything is set up properly before we build our face detection model.


Step 1: Hardware Setup

The first thing you have to do is check if the webcam is setup correctly. A simple trick in Ubuntu – see if the device has been registered by the OS. You can follow the steps given below:

  1. Before connecting the WebCam to the laptop, check all the connected video devices by going to the command prompt and typing ls /dev/video*. This will print the video devices which are already connected to the system.
  2. Connect the WebCam and run the command again.If the WebCam has connected successfully, a new device will be shown by the command.
  3. Another thing you can do is to use any webcam software to check if the webcam is working correctly. You can use “Cheese” in Ubuntu for this.
    Here we can see that the webcam is setup correctly. And that’s it for the hardware side!


Step 2: Software Setup

Step 2.1: Install Python

The code in this article is built using Python version 3.5. Although there are multiple ways to install Python, I would recommend using Anaconda – the most popular Python distribution for data science. Here is a link to install Anaconda in your system.

Step 2.2: Install OpenCV

OpenCV (Open Source Computer Vision) is a library aimed at building computer vision applications. It has numerous pre-written functions for image processing tasks. To install OpenCV, do a pip install of the library:

pip3 install opencv-python

Step 2.3: Install face_recognition API;

Finally, we will use face_recognition, dubbed as the world’s simplest facial recognition API for Python. To install:
pip install dlib
pip install face_recognition


Let’s Dive into the Implementation

Now that you have setup your system, it’s finally time to dive in to the actual implementation. First, we will quickly build our program, then break it down to understand what we did.


Simple Walkthrough

First, create a file and then copy the code given below:

# import libraries
import cv2
import face_recognition

# Get a reference to webcam 
video_capture = cv2.VideoCapture("/dev/video1")

# Initialize variables
face_locations = []

while True:
    # Grab a single frame of video
    ret, frame =

    # Convert the image from BGR color (which OpenCV uses) to RGB color (which face_recognition uses)
    rgb_frame = frame[:, :, ::-1]

    # Find all the faces in the current frame of video
    face_locations = face_recognition.face_locations(rgb_frame)

    # Display the results
    for top, right, bottom, left in face_locations:
        # Draw a box around the face
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

    # Display the resulting image
    cv2.imshow('Video', frame)

    # Hit 'q' on the keyboard to quit!
    if cv2.waitKey(1) & 0xFF == ord('q'):

# Release handle to the webcam

Then, run this Python file by typing:


If everything works correctly, a new window will pop up with real-time face detection running.

To summarize, this is what our above code did:

  1. First, we defined the hardware on which the video analysis will be done
  2. From this, we captured the video in real-time, frame by frame
  3. Then we processed each frame and extracted the locations of all the faces in the image
  4. Finally, we rendered these frames in video form, along with the face locations

Simple, isn’t it? If you want to go into more granular details, I have included the comments in each code section. You can always go back and review what we have done.


Face Detection Use Case

The fun doesn’t stop there! Another cool thing we can do – build a complete use case around the above code. And you don’t need to start from scratch. We can make just a few small changes to the code and we’re good to go.

Suppose, for example, you want to build an automated camera-based system to track where the speaker is in real-time. According to his position, the system rotates the camera so that the speaker is always in the middle of the video.

How do we go about this? The first step is to build a system which identifies the person(s) in the video, and focuses on the location of the speaker.

Let’s see how we can implement this. For this article, I have taken a video from Youtube which shows a speaker talking during the DataHack Summit 2017 conference.

First, we import the necessary libraries:

import cv2
import face_recognition

Then, read the video and get the length:

input_movie = cv2.VideoCapture("sample_video.mp4")
length = int(input_movie.get(cv2.CAP_PROP_FRAME_COUNT))

After that, we create an output file with the required resolution and frame rate which is similar to the input file.

Load a sample image of the speaker to identify him in the video:

image = face_recognition.load_image_file("sample_image.jpeg")
face_encoding = face_recognition.face_encodings(image)[0]

known_faces = [

All this completed, now we run a loop that will do the following:

  • Extract a frame from the video
  • Find all the faces and identify them
  • Create a new video to combine the original frame with the location of the face of the speaker annotated

Let’s see the code for this:

# Initialize variables
face_locations = []
face_encodings = []
face_names = []
frame_number = 0

while True:
    # Grab a single frame of video
    ret, frame =
    frame_number += 1

    # Quit when the input video file ends
    if not ret:

    # Convert the image from BGR color (which OpenCV uses) to RGB color (which face_recognition uses)
    rgb_frame = frame[:, :, ::-1]

    # Find all the faces and face encodings in the current frame of video
    face_locations = face_recognition.face_locations(rgb_frame, model="cnn")
    face_encodings = face_recognition.face_encodings(rgb_frame, face_locations)

    face_names = []
    for face_encoding in face_encodings:
        # See if the face is a match for the known face(s)
        match = face_recognition.compare_faces(known_faces, face_encoding, tolerance=0.50)

        name = None
        if match[0]:
            name = "Phani Srikant"


    # Label the results
    for (top, right, bottom, left), name in zip(face_locations, face_names):
        if not name:

        # Draw a box around the face
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255), 2)

        # Draw a label with a name below the face
        cv2.rectangle(frame, (left, bottom - 25), (right, bottom), (0, 0, 255), cv2.FILLED)
        font = cv2.FONT_HERSHEY_DUPLEX
        cv2.putText(frame, name, (left + 6, bottom - 6), font, 0.5, (255, 255, 255), 1)

    # Write the resulting image to the output video file
    print("Writing frame {} / {}".format(frame_number, length))

# All done!

The code would then give you an output like this:

What a terrific thing face detection truly is. 🙂



Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Accelerate your deep learning journey with the following Practice Problems:

Practice Problem: Identify the Apparels Identify the type of apparel for given images
Practice Problem: Identify the Digits Identify the digit in given images
Practice Problem: Face Counting Challenge Predict the headcount given a group selfie/photo


Congratulations! You now know how to build a face detection system for a number of potential use cases. Deep learning is such a fascinating field and I’m so excited to see where we go next.

In this article, we learned how you can leverage open source tools to build real-time face detection systems that have real-world usefulness. I encourage you to build plenty of such applications and try this on your own. Trust me, there’s a lot to learn and it’s just so much fun!

As always, feel free to reach out if you have any queries/suggestions in the comment section below!

JalFaizy Shaikh 06 May, 2019

Faizan is a Data Science enthusiast and a Deep learning rookie. A recent Comp. Sc. undergrad, he aims to utilize his skills to push the boundaries of AI research.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers


Taofeek olalekan
Taofeek olalekan 11 Dec, 2018

Nice, A great work Weldon, will try it out, but is it a must to set up the hardware implementation on Ubuntu why not Windows because of Window users. And can I have your contact Info?

Taofeek olalekan
Taofeek olalekan 11 Dec, 2018

Nice, Awesome job. is it possible to set up the Hardware on Windows OS against Window users? Also, can I have your contact info? Thanks for sharing

Joaquin 12 Dec, 2018

Great tutorial, but can you provide the sample_image.jpg? I dont know how should it be composed. Regards!

Komal 13 Dec, 2018

Hi, I am very keen to try this. However, I would like to know if I can do it on Windows. I have Windows 10 as well as Windows 7. Please let me know how do I proceed for Windows.

Sachin Kalsi
Sachin Kalsi 26 Dec, 2018

Faizan, Thanks for writing a very good article. Inspired by this article I have written a web app. Please have a look at it. But I couldn't able to host the app in heroku since dlib library was not installing properly

Shivam Srivastava
Shivam Srivastava 06 Mar, 2019

Can you please tell me the code how can i display "Light is not proper" message if during face recognition the light is not proper or image is blurred?

palbha 26 Mar, 2019

Hi All , this ia an amazing post , but there is issue downloading face_Reocgnition in python 43.7 basically issue is with dlib and that is the reason its not working , Can anyone suggest how can we download it , also is there any way we can run it on cloud without downloading library everytime we run the code .

Umair Raza
Umair Raza 15 Jun, 2019

If we have to write only those frames in the video in which that particular person is present then what will be the changing in the code? Because this code saves the whole input video and just draw the rectangle and name around the face of that person and i don't want to add the other frames. kindly help me its urgent

anil 18 Jun, 2019

Hi great tutorial i would like to know if same can be done using CCTV camera, if so can you please share the steps for the same.

Anna 19 Oct, 2021