18 All-Time Classic Open Source Computer Vision Projects for Beginners

Shipra Saxena 08 Sep, 2020 • 12 min read

Overview

Open source computer vision projects are a great segway to landing a role in the deep learning industry
Start working on these 18 popular and all-time classic open source computer vision projects

Introduction

Computer vision applications are ubiquitous right now. I honestly can’t remember the last time I went through an entire day without encountering or interacting with at least one computer vision use case (hello facial recognition on my phone!).

But here’s the thing – people who want to learn computer vision tend to get stuck in the theoretical concepts. And that’s the worst path you can take! To truly learn and master computer vision, we need to combine theory with practiceal experience.

And that’s where open source computer vision projects come in. You don’t need to spend a dime to practice your computer vision skills – you can do it sitting right where you are right now!

open-source computer vision projects

So in this article, I have coalesced and created a list of Open-Source Computer Vision projects based on the various applications of computer vision. There’s a LOT to go through and this is quite a comprehensive list so let’s dig in!

If you are completely new to computer vision and deep learning and prefer learning in video form, check this out:

Computer Vision using Deep Learning 2.0

The 18 Open Source Computer Vision Projects are Divided into these Categories:

Image Classification
Face Recognition
Neural Style Transfer Using GANs
Scene Text Detection
Object Detection With DETR
Semantic Segmentation
Road Lane Detection in Autonomous Vehicles
Image captioning
Human Pose Estimation Projects
Emotion Recognition through Facial Expressions

Open-Source Computer Vision Projects for Image Classification

Image classification is a fundamental task in computer vision. Here, the goal is to classify an image by assigning a specific label to it. It’s easy for us humans to comprehend and classify the images we see. But the case is very different for a machine. It is an onerous assignment for a machine to differentiate among a car and an elephant.

Here are two of the most prominent open-source projects for image classification:

Cifar 10

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most popular datasets for machine learning research. It contains 60,000, 32×32 colour images in 10 different classes. The classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.
ImageNet

The ImageNet dataset is a large visual database for use in computer vision research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories!

As a beginner, you can start with a neural network from scratch using Keras or PyTorch. For better results and increasing the level of learning, I will advise using transfer learning through pre-trained models like VGG-16, Restnet- 50, Googlenet, etc.

open-source computer vision projects -

I recommend going through the below article to know more about image classification:

Top 4 Pre-Trained Models for Image Classification with Python Code

I’d also suggest going through the below papers for a better understanding of image classification:

Open-Source Computer Vision Projects for Face Recognition

Face recognition is one of the prominent applications of computer vision. It’s used for security, surveillance, or in unlocking your devices. It is the task of identifying the faces in an image or video against a pre-existing database. We can use deep learning methods to learn the features of the faces and recognizing them.

It is a multi-stage process, consisting of the following steps:

Face Detection: It is the first step and involves locating one or more faces present in the input image or video.
Face Alignment: Alignment is normalizing the input faces to be geometrically consistent with the database.
Feature Extraction: Later, features are extracted that can be used in the recognition task.
Feature recognition: Perform matching of the input features to the database.

The following open-source datasets will give you good exposure to face recognition-

MegaFace

MegaFace is a large-scale public face recognition training dataset that serves as one of the most important benchmarks for commercial face recognition problems. It includes 4,753,320 faces of 672,057 identities
Labeled faces in wild home

Labeled Faces in the Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition. It has 13,233 images of 5,749 people that were detected and collected from the web. Also, 1,680 of the people pictured have two or more distinct photos in the dataset.

In addition, for taking the project to an advanced stage, you can use pre-trained models like Facenet.

Facenet is a deep learning model that provides unified embeddings for face recognition, verification, and clustering task. The network maps each face image in euclidean space such that the distance between similar images is less.

open-source computer vision projects - facenet

Source

You can easily use pre-trained Facenet models available in Keras and PyTorch to make your own face recognition system.

There is some more state of the art face recognition models are available you can experiment with. Deepface is a Deep CNN based network developed by Facebook researchers. It was a major milestone in the use of deep learning in a face recognition task.

To better understand the development in face recognition technology in the last 30 years, I’d encourage you to read an interesting paper titled:

Deep Face Recognition: A Survey

Open-Source Computer Vision Projects for Neural Style Transfer Using GANs

Neural style transfer is a computer vision technology that recreates the content of one image in the style of the other image. It is an application of a Generative Adversarial Network (GAN). Here, we take two images – a content image and a style reference image and blend them together such that the output image looks like a content image painted in the style of the reference image.

This is implemented by optimizing the content statistics of output image matching to the content Image and Style statistics to the style reference image.

Open-Source Computer Vision Projects - Neural Style Transfer Using GANs

Source

Here is the list of some awesome datasets to practice:

COCO dataset

“COCO is a large-scale object detection, segmentation, and captioning dataset. The images in the dataset are everyday objects captured from everyday scenes. Further, it provides multi-object labeling, segmentation mask annotations, image captioning, and key-point detection with a total of 81 categories, making it a very versatile and multi-purpose dataset.
ImageNet
1. We’ve already mentioned this above – ImageNet is incredibly flexible!

In case you are wondering how to implement the style transfer model, here is a TensorFlow tutorial that can help you out. Also, I will suggest you read the following papers if you want to dig deeper into the technology:

Open-Source Computer Vision Projects for Scene Text Detection

Detecting text in any given scene is another very interesting problem. Scene text is the text that appears on the images captured by a camera in an outdoor environment. For example, number plates of cars on roads, billboards on the roadside, etc.

The text in scene images varies in shape, font, color, and position. The complication in recognition of scene text further increases by non-uniform illumination and focus.

Open-Source Computer Vision Projects - Scene Text Detection

The following popular datasets will help you enrich your skills in analyzing Scene Text Detection:

SVHN

The Street View House Numbers (SVHN) dataset is one of the most popular open source datasets out there. It has been used in neural networks created by Google to read house numbers and match them to their geolocations. This is a great benchmark dataset to play with, learn and train models that accurately identify street numbers. This dataset contains over 600k labeled real-world images of house numbers taken from Google Street View.
SceneText Dataset

The scene text dataset comprises of 3000 images captured in different environments, including outdoors and indoors scenes under different lighting conditions. Images were captured either by the use of a high-resolution digital camera or a low-resolution mobile phone camera. Moreover, all images have been resized to 640×480.

Further, scene text detection is a two-step process consisting of Text Detection in the image and text recognition. For text detection, I found a state of the art deep learning method EAST (Efficient Accurate Scene Text Detector). It can find horizontal and rotated bounding boxes. You can use it in combination with any text recognition method.

Here are some other interesting papers on scene text detection:

Open-Source Computer Vision Projects for Object Detection with DETR

Object detection is the task of predicting each object of interest present in the image through a bounding box along with proper labels on them.

A few months back, Facebook open-sourced its object detection framework- DEtection TRansformer (DETR). DETR is an efficient and innovative solution to object detection problems. It streamlines the training pipeline by viewing object detection as a direct set prediction problem. Further, it adopts an encoder-decoder architecture based on trans-formers.

Open-Source Computer Vision Projects - Object Detection With DETR

To know more about DERT, here is the paper and Colab notebook.

Diversify your portfolio by working on the following open-sourced datasets for object detection:

Open Images

Open Image is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. The dataset is split into a training set (9,011,219 images), a validation set (41,620 images), and a test set (125,436 images).
MS-COCO

MS-COCO is a large scale dataset popularly used for object detection problems. It consists of 330K images with 80 object categories having 5 captions per image and 250,000 people with key points.

You can read the following resources to learn more about Object Detection:

Open-Source Computer Vision Projects for Semantic Segmentation

When we talk about complete scene understanding in computer vision technology, semantic segmentation comes into the picture. It is the task of classifying all the pixels in an image into relevant classes of the objects.

Open-Source Computer Vision Projects - Semantic Segmentation

Below is the list of open-source datasets to practice this topic:

CamVid

This database is one of the first semantically segmented datasets to be released. This is often used in (real-time)semantic segmentation research. The dataset contains:
- 367 training pairs
- 101 validation pairs
- 233 test pairs
Cityscapes

This dataset is a processed subsample of original cityscapes. The dataset has still images from the original videos, and the semantic segmentation labels are shown in images alongside the original image. This is one of the best datasets around for semantic segmentation tasks. It has 2975 training images files and 500 validation image files each of 256×512 pixels

To read further about semantic segmentation, I will recommend the following article:

Semantic Segmentation: Introduction to the Deep Learning Technique Behind Google Pixel’s Camera!

Here are some papers available with code for semantic segmentation:

Open-Source Computer Vision Projects for Road Lane Detection in Autonomous Vehicles

An autonomous car is a vehicle capable of sensing its environment and operating without human involvement. They create and maintain a map of their surroundings based on a variety of sensors that fit in different parts of the vehicle.

These vehicles have radar sensors that monitor the position of nearby vehicles. While the video cameras detect traffic lights, read road signs, track other vehicles and Lidar (light detection and ranging) sensors bounce pulses of light off the car’s surroundings to measure distances, detect road edges, and identify lane markings

Lane detection is an important part of these vehicles. In road transport, a lane is part of a carriageway that is designated to be used by a single line of vehicles to control and guide drivers and reduce traffic conflicts.

It is an exciting project to add on in your data scientist’s resume. The following are some datasets available to experiment with-

TUsimple

This dataset was part of the Tusimple Lane Detection Challenge. It contains 3626 video clips of 1-sec duration each. Each of these video clips contains 20 frames with an annotated last frame. It consists of training and test datasets with 3626 video clips, 3626 annotated frames in the training dataset, and 2782 video clips for testing.

In case, you are looking for some tutorial for developing the project check the article below-

Hands-On Tutorial on Real-Time Lane Detection using OpenCV (Self-Driving Car Project!)

Open-Source Computer Vision Projects for Image Captioning

Have you ever wished for some technology that could caption your social media images because neither you nor your friends are able to come up with a cool caption? Deep Learning for image captioning comes to your rescue.

Image captioning is the process of generating a textual description for an image. It is a combined task of computer vision and natural language processing (NLP).

Computer vision methods aid in understanding and extracting the feature from the input images. Further, NLP converts the image into the textual description in the correct order of words.

The following are some useful datasets to get your hands dirty with image captioning:

COCO Caption

COCO is large-scale object detection, segmentation, and captioning dataset. It consists of of330K images (>200K labeled) with 1.5 million object instances and 80 object categories given 5 captions per image.
Flicker 8k dataset

It is an image caption corpus consisting of 158,915 crowd-sourced captions describing 31,783 images. This is an extension of Flickr 8k Dataset. The new images and captions focus on people doing everyday activities and events.

If you are looking for the implementation of the project, I will suggest you look at the following article:

Automatic Image Captioning using Deep Learning (CNN and LSTM) in PyTorch

Also, I suggest you go through this prominent paper on Image Captioning.

Open-Source Computer Vision Projects for Human Pose Estimation

Human Pose Estimation is an interesting application of Computer Vision. You must have heard about Posenet, which is an open-source model for Human pose estimation. In brief, pose estimation is a computer vision technique to infer the pose of a person or object present in the image/video.

Before discussing the working of pose estimation, let us first understand ‘Human Pose Skeleton’. It is the set of coordinates to define the pose of a person. A pair of coordinates is a limb. Further, pose estimation is performed by identifying, locating, and tracking the key points of Humans pose skeleton in an Image or video.

Source

The following are some datasets if you want to develop a pose estimation model:

MPII

MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. Overall the dataset covers 410 human activities and each image has an activity label.
HUMANEVA

The HumanEva-I dataset contains 7 calibrated video sequences that are synchronized with 3D body poses. The database contains 4 subjects performing 6 common actions (e.g. walking, jogging, gesturing, etc.) that are split into training, validation, and testing sets.

I found DeepPose by Google as a very interesting research paper using deep learning models for pose estimation. In addition, you can visit multiple research papers available on the pose estimation to understand it better.

Open-Source Computer Vision Projects for Emotion Recognition through Facial Expressions

Facial expressions play a vital role in the process of non-verbal communication, as well as for identifying a person. They are very important in recognizing a person’s emotions. Consequently, information on facial expressions is often used in automatic systems of emotion recognition.

Emotion Recognition is a challenging task because emotions may vary depending on the environment, appearance, culture, and face reaction which leads to ambiguous data.

The face expression recognition system is a multistage process consisting of face image processing, feature extraction, and classification.

source

Below is a dataset you can practice on:

Real-world Affective Faces Database

Real-world Affective Faces Database (RAF-DB) is a large-scale facial expression database with around 30K great-diverse facial images. It consists of 29672 real-world images, and 7-dimensional expression distribution vector for each image,

You can read these resources to increase your understanding further-

End Notes

To conclude, in this article we discussed 10 interesting computer vision projects you can implement as a beginner. This is not an exhaustive list. So if you feel we missed something, feel free to add in the comments below!

Also, here I am listing down some useful CV resources to help you explore the deep learning and Computer vision world:

There is a lot of difference in the data science we learn in courses and self-practice and the one we work in the industry. I’d recommend you to go through these crystal clear free courses to understand everything about analytics, machine learning, and artificial intelligence:

I hope you find the discussion useful. Now it’s your turn to start the implementation of the computer vision on your own.