Google’s Amazing Self-Supervised Computer Vision Model can Track Objects in any Video!

Pranav Dar Last Updated : 28 Jun, 2018

3 min read

Overview

The Google AI team has developed a convolutional neural network that can track objects in videos, without requiring any labelled data
The training data included videos from the publicly available Kinetics dataset
This model can do multiple things – colorization, pose estimation, and of course object tracking

Introduction

Self-supervised learning has so much untapped potential in deep learning. Where supervised learning requires tons of labelled data to come up with an accurate and precise solution, self-supervised learning only needs a sliver of that labelled data (if any at all). Which is what makes it such a challenging and difficult line of work.

But self-supervised learning has been garnering attention recently, especially in the field of computer vision (which notoriously requires more labelled data than most fields to give a proper output). And now the Google AI team has developed a model that can track objects in videos without requiring labelled at all.

The team has designed a convolutional neural network that adds color to grayscale videos. While doing this, the network learns by itself to visually track objects in the video. The team admits in a blog post that the model was never trained with the singular aim of tracking, but it managed to learn without supervision and can follow multiple objects and remain robust without requiring ANY labelled training data!

The researchers used videos from the public Kinetics dataset to train the model. Keep in mind that all these videos are in color so they were first converted to grayscale, except the very first frame in each video. The convolutional network was then trained to predict the original colors in all the remaining frames. The below collection of images illustrates this technique well:

You might we wondering why did they decolor the videos in the first place? This is because there’s a good chance that there might be multiple objects in the video with the same color and by converting it to greyscale and then adding color again, the team was able to teach the machine to track specific objects.

An important part of designing and using models in deep learning is their interpretability, which isn’t easy given the complexity associated with them. According to their blog post, they used “a standard trick to visualize the embeddings learned by the model by projecting them down to three dimensions using Principal Component Analysis (PCA) and plotting it as an RGB movie”.

Another finding from the model was that it’s even able to track the pose of humans. See the below image that shows the poses of different humans being tracked (this was tested on the JHMDB dataset).

You can read about this technique in more detail in Google’s research paper here.

Our take on this

If you read the paper (and you really should!) you’ll see that the results of this model don’t outperform high-end supervised models. But since this is just the starting point for self-supervised video tracking, I think we can expect that gap to shrink significantly soon.

I especially liked that the model is doing multiple things – colorization, pose estimation, and of course object tracking. It turns out that the failures of the model are correlated with a failure to colorize videos, which pinpoints where the team needs to work on. This is definitely something we should keep our eye on in the foreseeable future as the potential and possibilities of using this technique are vast.

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Pranav Dar

Senior Editor at Analytics Vidhya.Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

AVbytes

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Google’s Amazing Self-Supervised Computer Vision Model can Track Objects in any Video!

Overview

Introduction

Our take on this

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

Google’s Amazing Self-Supervised Computer Vision Model can Track Objects in any Video!

Overview

Introduction

Our take on this

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques