Learn everything about Analytics

Facebook Open Sources Dataset on NLP and Navigation Every Data Scientist should Download

Overview

  • Researchers from the University of Montreal and Facebook’s AI department (FAIR) have curated and open sourced the “Talk the Walk” dataset
  • The dataset consists of 3 elements – maps of certain parts on New York, 360 degree images of these locations, and sample conversations between humans guiding each other (10k dialogues)
  • They have also released baseline models to help you get started on this challenge

 

Introduction

Existing Natural Language Techniques (NLP) focus mostly on transcribing what humans say, rather than understanding what’s being said. Even with the release of advanced chatbot technologies like Google Duplex and Microsoft’s Xiaoice, this is a challenge that has eluded researchers so far.

This has prompted a group of researchers from the University of Montreal and Facebook’s AI department (FAIR) to curate a dataset called “Talk the Walk” that aims to teach the machine to understand language in the same way that a human does. The researchers have of course open sourced the dataset and opened up the challenge to the wider machine learning community.

The dataset is essentially made up of three elements:

  • Maps of certain parts of New York
  • 360 degree images of locations on the map captured through camera sensors
  • The NLP task – a sample of conversations between people guiding each other to specific locations (10k dialogues)

The idea behind this research is to get two agents talking to each other – a “tourist”, and a “guide”. The “tourist” has access to the 360 degree images of the locations but not the map, and the “guide” has access to the map but the images. Can you distinguish which is the human and the machine in this case? Below is a sample screenshot from the dataset:

The researchers have also released baseline results of the experiments they ran. Watch the below video, released by Facebook, which illustrates their approach to the problem:

Talk the Walk: AI "tourist" and "guide" demonstration

Facebook Engineering यांनी वर पोस्ट केले मंगळवार, १० जुलै, २०१८

Below are a few resources to get you started on this challenge:

 

Our take on this

This is one the most difficult challenges you’ll see anywhere. It combines so many machine learning tasks that it can become daunting. One of the authors of the research paper himself admitted that breakthroughs in this study might be a few years away. But when it does happen, it has the potential to be a game changer in the NLP as well as navigational guidance domains.

But don’t let that deter you! Download the dataset, and try to understand all that it has. If you don’t understand something, use the comments section below to ask. Play around with parts of the data and publish your findings and analysis online. You never know where inspiration might strike.

 

Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!

 

You can also read this article on Analytics Vidhya's Android APP Get it on Google Play

2 Comments

  • Ateeque Shaikh says:

    Data can be accessed only through python and not R. Can you post a guide wherein the data can be accessed through R?

    • Pranav Dar says:

      Hi Ateeque,

      The only way to download it right now is through the Python command mentioned on their GitHub repository. We will need to figure out the data source first before we can use R to scrape the data. Will keep you updated if I figure something out.

Join 100000+ Data Scientists in our Community

Receive awesome tips, guides, infographics and become expert at:




 P.S. We only publish awesome content. We will never share your information with anyone.

Join 100000+ Data Scientists in our Community

Receive awesome tips, guides, infographics and become expert at:




 P.S. We only publish awesome content. We will never share your information with anyone.