High-accuracy Outdoor RGB-D Video Sequence for Semantic Scene Parsing
发布日期:2017-07-04 10:37:57浏览量:984次

Dataset Overview

The ZPark_2K Dataset contains images, depth maps and pixel level semantic labels of 2 kilometer urban street scenes. 
The figure below shows one image set, including the orginal image, the overlayed segmented image, the depth map, and the color-coded semantic label map. 

The raw data is aquired with a survey grade mobile scanner, together with high-resolution color cameras. The 3D point cloud has an average point density of 2cm and a depth accuracy of better than 1cm. The point cloud is precisely registered with color images. For each image, a synthesized depth map is rendered using the 3D point cloud. In this way, we created an outdoor high-precision RGB-D image sequence with per-pixel semantic labels.

In the following, we give an overview of design choices and detailed data specifications.


  • multiple scan: contains 10 scans of the same street scenes to provide illumination variation
  • semi-auto labeling: 3002 samples with semi-auto pixel level labeling

Class Definitions

class class_id category category_id color_code
others 0 others 0 000000
sky 17 sky 1 87ceeb
motor vehicle 33 moving object 2 400080
non-motor vehicle 34 moving object 2 4080c0
pedestrian 35 moving object 2 404000
rider 36 moving object 2 0080c0
other flat 48 flat 3 804080
motorway 49 flat 3 8000c0
bicycle lane 50 flat 3 c00040
sidewalk 51 flat 3 8080c0
other boundary 64 boundary 4 804040
curb 65 boundary 4 c080c0
other roadblock 80 roadblock 5 c08040
traffic cone 81 roadblock 5 000040
road pile 82 roadblock 5 0000c0
fence 83 roadblock 5 404080
other object 96 object 6 c04080
street lamp 97 object 6 c08080
traffic light 98 object 6 004040
pole 99 object 6 c0c080
traffic sign 100 object 6 4000c0
billboard 101 object 6 c000c0
bus stop board 102 object 6 c00080
other construction 112 construction 7 808000
building 113 construction 7 800000
newsstand 114 construction 7 408040
security booth 115 construction 7 808040
other nature 128 nature 8 c0c000
vegetation 129 nature 8 40c000

* The current labeling process may lead to errorneous labels for fast moving objects. This is expected to be fixed around Septh 2017, when a much larger data set will also be released by then.


RGB images

3002 8-bit images. The unzipped path is {record}/{camera}/{timestamp}_{camera}.jpg. Our acquisition system has mutiple cameras and split data to records automatically on time intervals. 
file: (4.1G, md5=3effa00dd59e9b72aac6b01dc6669c51)

Depth maps

16-bit depth maps for the corresponding images. The path structure is same with RGB images. Due to the limitation of the acquisition system, depth of moving object is inaccurate. 
file: (1.7G, md5=9315cc3c8de83916b2454d3fe1485e78) 

depth(meters)=(grayscale pixel intensity)/200



Pixel level semantic labels for the corresponding images. The label is stored as 8-bit color images using the color table defined in class definitions. The path structure is same with RGB images. Due to the limitaion the acquisition system, labels of moving object is inaccurate. 
file: (241M, md5=6a31bb41abe5b9a81a49903a9cecd46f)


Please contact Dr. Ruigang Yang  for details about accessing the data.