Learn everything about Analytics

Google has Released the Latest Open Images Dataset! Every Data Scientist should Work with this

Overview

  • Open Images is a massive dataset which contains close to 9 million images
  • All images come with labels that were prepared manually by professional annotators
  • The dataset is divided into the training (9 million+ images), validation (41k+ images), and test (125k+ images) set
  • Google has also announced an object detection challenge for data scientists

 

Introduction

As a data scientist, finding large datasets to work with is a challenge. Most organizations treasure their data and prefer not releasing it to the community. But Google has been one of the few who has consistently open sourced a lot of their research in order to speed up studies and also help budding data scientists.

This week, they have released version 4 of their popular Open Images dataset – free and available for anyone to download and work with.

Open Images is a massive dataset of images which was released by Google back in 2016. The dataset consists of 9 million images that have already been labelled by the team. According to their site, “The training set of V4 contains 14.6M bounding boxes for 600 object classes on 1.74M images, making it the largest existing dataset with object location annotations”.

These annotations have been drawn manually by professional annotators in order to ensure accuracy and consistency. The subject matter in the images is diverse in nature. There are 8.4 objects per image on average in this dataset. To add the icing on the cake, the data is annotated with image-level labels that span thousands of classes!

The Open Images dataset is pre-split into the training, validation and test sets. The training set contains 9,011,219 images, the validation set has 41,260 images and the test set has 125,436 images. All of these images come with proper labels to help you get down to building a model as quickly as possible.

Along with this dataset release, Google has announced the ‘Open Images Challenge 2018’. This is scheduled to be held at the European Conference on Computer Vision and will be an object detection challenge. This latest competition is offering a far more broader range of object classes than any previous challenge. It will have two tracks:

  • Object Class Detection: predicting a tight bounding box around all instances of the 500 classes
  • Visual Relationship Detection: detecting pairs of objects in particular relations, e.g. “woman playing guitar”. This is done by adding large number of images with multiple object annotations

The deadline for submission of results is 1st September, 2018.  The evaluation metric for this challenge will be mean Average Precision (mAP) over the given 500 classes.

This is the fourth update the team has released in the last 2 years. You can download the dataset from Google’s page here.

 

Our take on this

This is a treasure trove for data scientists! Anyone interested in deep learning and image classification can download and work on this dataset. The fact that Google has worked on labelling the images is a testament to their team and to the power of their resources. The training set, with it’s massive size, is expected to stimulate research on more complex detection models. The hope is that this release will help in improving current state-of-the-art models.

Their open challenge is already generating a huge buzz in the ML community and we are expecting to see some serious competition. We will be sure to cover any major projects that come up in this challenge.

If you’re a newcomer to image processing, or have been working in this field for a while, this dataset is perfect for you. Use the comments section below to tell us how you plan on using this!

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play

4 Comments

  • Aditya Malte says:

    This is a breakthrough!!
    However, I am unable to download data of a specific category (eg. cat images) to my computer from the given link.
    Any suggestions?

    • Pranav Dar says:

      Hi Aditya,

      I don’t think that is available anywhere on their site. You have to download the entire dataset (or the train/test/validation splits separately). I’ll look into it more and give you an update in case I come across this particular feature.

  • ddflower says:

    is there a places describe the 500 class label — what type of objects? thanks!

%d bloggers like this:
Join 150000+ Data Scientists in our Community

Receive awesome tips, guides, infographics and become expert at:




 P.S. We only publish awesome content. We will never share your information with anyone.

Subscribe!
%d bloggers like this:
Join 150000+ Data Scientists in our Community

Receive awesome tips, guides, infographics and become expert at:




 P.S. We only publish awesome content. We will never share your information with anyone.

Subscribe!