Baidu has Released a Gigantic Self-Driving Dataset named ApolloScape

Pranav Dar 19 Mar, 2018 • 2 min read


  • ApolloScape is an open source dataset billed as the largest self-driving dataset in the world
  • It includes 26 pre-defined semantic items
  • Baidu Apollo is the technology behind this



Autonomous vehicles are the talk of the town these days. Whichever AI and machine learning conference you hear about, self-driving cars are almost always a topic of conversation. But progress has been slow and it still feels like we’re a while away from seeing these vehicles in practical situations.

So Baidu has decided to include the AI and ML community in the hopes of speeding up research. The company has announced the release of the world’s largest open-source dataset for self-driving technology.

Named ApolloScape, the dataset has been released under the umbrella of Baidu’s self-driving platform Apollo. Apollo is a high performance flexible architecture which supports fully autonomous driving capabilities. The team behind this has recently released a version 2.0 of the platform which enables a vehicle “to drive on simple urban roads autonomously. It is able to cruise, avoid collisions with obstacles, stop at traffic lights and change lanes”.

ApolloScape will provide researchers and developers a solid framework and a base for building self-driving vehicles. The dataset has 26 pre-defined semantic items, like cars, buildings, people walking on the sidewalk, traffic lights, street lights, etc. This has been done with pixel-by-pixel semantic segmentation technique.

You can check out Apollo’s GitHub page here and follow along with the instructions to deploy it on your own machine.


Our take on this

The data provided in the ApolloScape project is almost 10 times more than any previously released open source datasets like CityScapes and Kitti. It will save researchers a massive amount of time and money that would otherwise have gone into collecting and parsing through data.

According to a report by the RAND corporation, autonomous vehicles would have to be driven hundreds of millions of miles in order to demonstrate their reliability in terms of fatalities and injury. In fact, to even gain a 20% advantage over human drivers, 100 cars would have to be driven continuously for 500 years!

I hope this dataset will speed up research in this field and benefits your learning as a data science enthusiast as well.


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


Also, go ahead and participate in our Hackathons, including the DataHack Premier League and Lord of the Machines!


Pranav Dar 19 Mar 2018

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers