Aishwarya Singh — Published On June 7, 2018 and Last Modified On May 10th, 2019


  • Google has updated the YouTube-8M dataset
  • The dataset consists of 6.1M videos URLs, labeled with a vocabulary of 3,862 visual entities
  • The video-level dataset comes out to be 18 GB in size, while the frame-level features are approximately 1.3 TB



This is turning into a week of major open source dataset releases in computer vision! We saw Berkeley unveiling their self-driving dataset recently and now Google has announced an update to it’s popular ‘YouTube-8M’ dataset.

YouTube-8M is a video dataset that consists of millions of YouTube video IDs. It includes high-quality machine-generated annotations derived from numerous visual entities and audio-visual features from billions of frames and audio segments. In short, it is perfect for anyone learning, or already in the computer vision field.

The dataset is designed to fit on a single hard disk which enables training of a baseline model in less than a day on a single GPU! The idea was to create a large-scale dataset that can be used for exploration of complex audio-visual models, which usually take a good number of weeks to train.

The major improvements in the new edition include improved quality of annotations and machine-generated labels.These are obtained by combining audio-visual content with title, description and other metadata. The updated version contains 6.1 million URLs, labeled with a vocabulary of 3,862 visual entities. Each video is annotated with one or more labels (an average of 3 labels per video).

The team has presented a starter-code on their GitHub page to for this enormous dataset. Along with the code, python scripts for comparison between models using the standard evaluation metrics can also be found.

It is recommended by the developers that you download a fraction of the dataset to start with, and then download more as you go along. If you prefer to download the entire dataset in one go, that’s also possible but will require a lot of internet bandwidth (not to mention space on your machine). The video-level training set comes up to about 18 GB. The frame-level features will take up approximately 1.3 TB of space so ensure you are all set before you begin downloading!

Get the instructions to download this awesome labelled dataset here.


Our take on this

From the original version of the dataset that included 8.2M videos with 1.8 labels/video, this updated version has 6.1M videos with 3.0 labels/video. The dataset size may be reduced but don’t let that put you off – there is a major improvement in the quality of labels.

Also the number of classes has been reduced from 4800 to 3862! Not only this, the latest version has about 2.6B audio-visual features, while previously we saw 1.9B visual-only features. With this update, Google are hoping to help researchers in understanding large-scale videos. If you are into deep learning, or are getting started with this field, I recommend getting your hands on this dataset and practising your newly acquired skills!


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


About the Author

Aishwarya Singh
Aishwarya Singh

An avid reader and blogger who loves exploring the endless world of data science and artificial intelligence. Fascinated by the limitless applications of ML and AI; eager to learn and discover the depths of data science.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *