产品服务
语音技术
图像技术
人脸与人体识别
视频技术
AR与VR
自然语言处理
数据智能
知识图谱
软硬一体产品
专项解决方案
定制化训练平台
深度学习开放平台
内置离线识别SDK,5分钟快速搭建人脸识别业务应用
硬件开源,软件开放,为机器人打造软硬一体的视觉方案
看得准、听得懂、说得明、交互好的一体化服务机器人
快速完成用户身份核实,确保真人且为本人
在复杂环境下自动识别特定人员及异常行为
针对准入场景,提供行业定制的全流程方案
辅助坐席人员实时了解客户意图,推荐话术应答
参会人邀请注册、刷脸签到、与会人统计管理平台
通过语音交互提供百度搜索结果,增强搜索能力
人脸注册、人脸捕获、会员识别整套解决方案
提供自研的数据仓库、日志分析、数据挖掘等方案
可视化训练模型服务平台,获取高精度定制图像识别、声音识别服务
自助定制专属模板,实现海量单据卡证的结构化识别
智能对话系统开发平台,让产品快速拥有对话交互能力
集合代码环境、算法算力和数据集的线上一站式开发平台
提供深度学习系列公开课与商业案例
最符合中国开发者需要的深度学习框架
基于真实样本的超大规模的开放数据集
AI加速器
AI市场
资讯
社区
High-accuracy Outdoor RGB-D Video Sequence for Semantic Scene Parsing
发布日期:2017-07-04 10:37:57浏览量:1227次
相关标签

Dataset Overview

The ZPark_2K Dataset contains images, depth maps and pixel level semantic labels of 2 kilometer urban street scenes. 
The figure below shows one image set, including the orginal image, the overlayed segmented image, the depth map, and the color-coded semantic label map. 

The raw data is aquired with a survey grade mobile scanner, together with high-resolution color cameras. The 3D point cloud has an average point density of 2cm and a depth accuracy of better than 1cm. The point cloud is precisely registered with color images. For each image, a synthesized depth map is rendered using the 3D point cloud. In this way, we created an outdoor high-precision RGB-D image sequence with per-pixel semantic labels.

In the following, we give an overview of design choices and detailed data specifications.

Features

  • multiple scan: contains 10 scans of the same street scenes to provide illumination variation
  • semi-auto labeling: 3002 samples with semi-auto pixel level labeling

Class Definitions

class class_id category category_id color_code
others 0 others 0 000000
sky 17 sky 1 87ceeb
motor vehicle 33 moving object 2 400080
non-motor vehicle 34 moving object 2 4080c0
pedestrian 35 moving object 2 404000
rider 36 moving object 2 0080c0
other flat 48 flat 3 804080
motorway 49 flat 3 8000c0
bicycle lane 50 flat 3 c00040
sidewalk 51 flat 3 8080c0
other boundary 64 boundary 4 804040
curb 65 boundary 4 c080c0
other roadblock 80 roadblock 5 c08040
traffic cone 81 roadblock 5 000040
road pile 82 roadblock 5 0000c0
fence 83 roadblock 5 404080
other object 96 object 6 c04080
street lamp 97 object 6 c08080
traffic light 98 object 6 004040
pole 99 object 6 c0c080
traffic sign 100 object 6 4000c0
billboard 101 object 6 c000c0
bus stop board 102 object 6 c00080
other construction 112 construction 7 808000
building 113 construction 7 800000
newsstand 114 construction 7 408040
security booth 115 construction 7 808040
other nature 128 nature 8 c0c000
vegetation 129 nature 8 40c000

* The current labeling process may lead to errorneous labels for fast moving objects. This is expected to be fixed around Septh 2017, when a much larger data set will also be released by then.

Format

RGB images

3002 8-bit images. The unzipped path is {record}/{camera}/{timestamp}_{camera}.jpg. Our acquisition system has mutiple cameras and split data to records automatically on time intervals. 
file: Image.zip (4.1G, md5=3effa00dd59e9b72aac6b01dc6669c51)

Depth maps

16-bit depth maps for the corresponding images. The path structure is same with RGB images. Due to the limitation of the acquisition system, depth of moving object is inaccurate. 
file: Depth.zip (1.7G, md5=9315cc3c8de83916b2454d3fe1485e78) 

depth(meters)=(grayscale pixel intensity)/200

 

Labels

Pixel level semantic labels for the corresponding images. The label is stored as 8-bit color images using the color table defined in class definitions. The path structure is same with RGB images. Due to the limitaion the acquisition system, labels of moving object is inaccurate. 
file: Label.zip (241M, md5=6a31bb41abe5b9a81a49903a9cecd46f)

Download

Please contact Dr. Ruigang Yang  for details about accessing the data.