This website uses cookies primarily for visitor analytics. Certain pages will ask you to fill in contact details to receive additional information. On these pages you have the option of having the site log your details for future visits. Indicating you want the site to remember your details will place a cookie on your device. To view our full cookie policy, please click here. You can also view it at any time by going to our Contact Us page.

You can now download the world's largest self-driving dataset

13 June 2018

UC Berkeley's dataset contains 100,000 video sequences which can be used by engineers and others to further develop self-driving technologies.

You can download the dataset called ‘BDD100K’ at

Each video in the dataset is roughly 40 seconds long at decent definition (720p and 30 frames per second). 

Along with each video GPS information recorded from mobile phones gives an indication of the approximate driving trajectory. All the videos were collected in various locations across the United States. 

Dataset covers a range of driving conditions

The publicly available videos provide a rich treasure trove to work from as they cover a multitude of different weather conditions from sunny, rainy and even hazy. The balance between day and night time conditions has also been praised. 

In addition to building self-driving cars, the dataset offers the opportunity for detecting pedestrians on the roads/pavements. There are more than 85,000 instances of pedestrians in the video which gives a solid database for this exercise. 

The open source dataset is organised and sponsored by Berkeley DeepDrive Industry Consortium, a group dedicated to investigating state-of-the-art technologies in computer vision and machine learning for automotive applications. Berkeley wasn't kidding when they said it was the largest ever publically available dataset. 

800 times larger than Baidu's data

In March, Baidu released a massive dataset for the time, but Berkeley's effort today is 800 times larger than Baidu’s, it's 4,800 times bigger than Mapillary’s dataset and 8,000 times bigger than KITTI. The datasets are expected to be a boon for self-driving technology developers working in the perception system for autonomous vehicles.

The demand for these types of datasets has been consistently high and there is no doubt some interesting work will come from the generosity of Berkeley. To coincide with the release of the open source dataset Berkeley has set up three challenges. 

Check out the challenges related to Road Object Detection, Drivable Area Segmentation and Domain Adaptation of Semantic Segmentation on their website. The challenges will allow emerging autonomous vehicle developments to be compared against the work of other key data scientists in the field. 

Autonomous driving is one of the fastest growing technology areas. From small university-based teams to the big guns like Google and Uber, everyone is determined to be the first to crack the technology that will bring driverless cars to our city streets. 

Self-driving cars have got a bad rap recently after an autonomous Uber car hit and killed a pedestrian while traveling in Tempe Arizona. Uber subsequently paused their self-driving development program, but that is not expected to last for long. 

The release of this huge dataset means that there is more diversity of data available for researchers and scientists to use in their journey to overcome self-driving car challenges. Berkeley researchers have suggested that they will add to the dataset in the future and expand from only monocular videos to include panorama and stereo videos as well as other types of sensors like LiDAR and radar.

Click here to view the original article.

Contact Details and Archive...

Print this page | E-mail this page