Google has Released the Latest Open Images Dataset! Every Data Scientist should Work with this

Pranav Dar Last Updated : 13 May, 2019

3 min read

Overview

Open Images is a massive dataset which contains close to 9 million images
All images come with labels that were prepared manually by professional annotators
The dataset is divided into the training (9 million+ images), validation (41k+ images), and test (125k+ images) set
Google has also announced an object detection challenge for data scientists

Introduction

As a data scientist, finding large datasets to work with is a challenge. Most organizations treasure their data and prefer not releasing it to the community. But Google has been one of the few who has consistently open sourced a lot of their research in order to speed up studies and also help budding data scientists.

This week, they have released version 4 of their popular Open Images dataset – free and available for anyone to download and work with.

Open Images is a massive dataset of images which was released by Google back in 2016. The dataset consists of 9 million images that have already been labelled by the team. According to their site, “The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations”.

These annotations have been drawn manually by professional annotators in order to ensure accuracy and consistency. The subject matter in the images is diverse in nature. There are 8.4 objects per image on average in this dataset. To add the icing on the cake, the data is annotated with image-level labels that span thousands of classes!

The Open Images dataset is pre-split into the training, validation and test sets. The training set contains 9,011,219 images, the validation set has 41,260 images and the test set has 125,436 images. All of these images come with proper labels to help you get down to building a model as quickly as possible.

Along with this dataset release, Google has announced the ‘Open Images Challenge 2018’. This is scheduled to be held at the European Conference on Computer Vision and will be an object detection challenge. This latest competition is offering a far more broader range of object classes than any previous challenge. It will have two tracks:

Object Class Detection: predicting a tight bounding box around all instances of the 500 classes
Visual Relationship Detection: detecting pairs of objects in particular relations, e.g. “woman playing guitar”. This is done by adding large number of images with multiple object annotations

The deadline for submission of results is 1st September, 2018. The evaluation metric for this challenge will be mean Average Precision (mAP) over the given 500 classes.

This is the fourth update the team has released in the last 2 years. You can download the dataset from Google’s page here.

Our take on this

This is a treasure trove for data scientists! Anyone interested in deep learning and image classification can download and work on this dataset. The fact that Google has worked on labelling the images is a testament to their team and to the power of their resources. The training set, with it’s massive size, is expected to stimulate research on more complex detection models. The hope is that this release will help in improving current state-of-the-art models.

Their open challenge is already generating a huge buzz in the ML community and we are expecting to see some serious competition. We will be sure to cover any major projects that come up in this challenge.

If you’re a newcomer to image processing, or have been working in this field for a while, this dataset is perfect for you. Use the comments section below to tell us how you plan on using this!

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Pranav Dar

Senior Editor at Analytics Vidhya.Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

AVbytes

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.6

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Aditya Malte

This is a breakthrough!! However, I am unable to download data of a specific category (eg. cat images) to my computer from the given link. Any suggestions?

Show 1 reply

Pranav Dar

Hi Aditya, I don't think that is available anywhere on their site. You have to download the entire dataset (or the train/test/validation splits separately). I'll look into it more and give you an update in case I come across this particular feature.

ddflower

is there a places describe the 500 class label -- what type of objects? thanks!

Show 1 reply

Pulkit Sharma

Hi, You can download the csv file from here which contains the description of each class.

Reading list

Google has Released the Latest Open Images Dataset! Every Data Scientist should Work with this

Overview

Introduction

Our take on this

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

Login to continue reading and enjoy expert-curated content.