Learn everything about Analytics

Home » Facebook Open Sources Dataset on NLP and Navigation Every Data Scientist should Download

Facebook Open Sources Dataset on NLP and Navigation Every Data Scientist should Download


  • Researchers from the University of Montreal and Facebook’s AI department (FAIR) have curated and open sourced the “Talk the Walk” dataset
  • The dataset consists of 3 elements – maps of certain parts on New York, 360 degree images of these locations, and sample conversations between humans guiding each other (10k dialogues)
  • They have also released baseline models to help you get started on this challenge



Existing Natural Language Techniques (NLP) focus mostly on transcribing what humans say, rather than understanding what’s being said. Even with the release of advanced chatbot technologies like Google Duplex and Microsoft’s Xiaoice, this is a challenge that has eluded researchers so far.

This has prompted a group of researchers from the University of Montreal and Facebook’s AI department (FAIR) to curate a dataset called “Talk the Walk” that aims to teach the machine to understand language in the same way that a human does. The researchers have of course open sourced the dataset and opened up the challenge to the wider machine learning community.

The dataset is essentially made up of three elements:

  • Maps of certain parts of New York
  • 360 degree images of locations on the map captured through camera sensors
  • The NLP task – a sample of conversations between people guiding each other to specific locations (10k dialogues)

The idea behind this research is to get two agents talking to each other – a “tourist”, and a “guide”. The “tourist” has access to the 360 degree images of the locations but not the map, and the “guide” has access to the map but the images. Can you distinguish which is the human and the machine in this case? Below is a sample screenshot from the dataset:

The researchers have also released baseline results of the experiments they ran. Watch the below video, released by Facebook, which illustrates their approach to the problem:

Below are a few resources to get you started on this challenge:


Our take on this

This is one the most difficult challenges you’ll see anywhere. It combines so many machine learning tasks that it can become daunting. One of the authors of the research paper himself admitted that breakthroughs in this study might be a few years away. But when it does happen, it has the potential to be a game changer in the NLP as well as navigational guidance domains.

But don’t let that deter you! Download the dataset, and try to understand all that it has. If you don’t understand something, use the comments section below to ask. Play around with parts of the data and publish your findings and analysis online. You never know where inspiration might strike.


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


You can also read this article on our Mobile APP Get it on Google Play
This article is quite old and you might not get a prompt response from the author. We request you to post this comment on Analytics Vidhya's Discussion portal to get your queries resolved


  • Ateeque Shaikh says:

    Data can be accessed only through python and not R. Can you post a guide wherein the data can be accessed through R?

    • Pranav Dar says:

      Hi Ateeque,

      The only way to download it right now is through the Python command mentioned on their GitHub repository. We will need to figure out the data source first before we can use R to scrape the data. Will keep you updated if I figure something out.