We introduce a large-scale dataset of real-life surveillance scenarios for crowd counting tasks, which consists of 13,945 manually annotated images of more than 500,000 human head dot annotations. Compared with other state-of-the-art datasets, it provides the highest average resolution (which ensures the image quality), and covers more challenging scenarios with complicated backgrounds and varied crowd count, which significantly increases the difficulty of crowd density estimation.
Proactive human-machine conversation is a new conversation task, which aims to build a human-like conversational agent endowed with the abilityof proactively leading the conversation, such as introducing a new topic or maintaining the current topic.
CCMT 2019 - BSTC
BSTC (Baidu Speech Translation Corpus) is a large-scale dataset for automatic simultaneous interpretation. BSTC version 1.0 contains 50 hours of real speeches, including three parts, the audio files, the transcripts, and the translations. The corpus can be used to build automatic simultaneous interpretation system. We also release a benchmark on this dataset.
ICDAR 2019 - LSVT
Robust text reading, including text detection, recognition and end-to-end spotting, etc, has been an active research area due to its profound impact and valuable applications. Hence, we collected a new large-scale scene text dataset, namely Large-scale Street View Text with Partial Labeling (LSVT), with 30,000 training data and 20,000 testing images in full annotations, and 400,000 training data in weak annotations, which are referred to as partial labels. In addition, an evaluation framework has been designed to allow all the submitted results to be evaluated and compared with one another in a uniform manner.
ICDAR 2019 - ArT
This competition introduces highly diversified scene text images in terms of text shapes. Specifically, almost a quarter of the text instances in the dataset intended for this competition are Arbitrary-Shaped Text (ArT), which are rarely seen in previous datasets. Besides, all the text regions were annotated with tight polygons to increase the difficulty level of the challenge. In other words, participating scene text detection models are expected to produce tight prediction region that fit text of arbitrary orientation. In addition, an evaluation framework has been designed to evaluate and compare all submitted results in a uniform manner.
Myopia has become a global burden of public health. With an increase in myopic refraction, high myopia will develop into pathologic myopia, which causes irreversible visual impairment to patients. Therefore, it's important to have early diagnosis and regular follow-up. With this challenge, we made available a large dataset of 1200 annotated retinal fundus images from non-pathological myopia subjects and pathological myopia patients (about 50%). In addition, an evaluation framework has been designed to allow all the submitted results to be evaluated and compared with one another in a uniform manner.
Glaucoma is currently the leading reason of irreversible blindness in the world. Glaucomatous optic neuropathy is the sine qua non of all forms of glaucoma. With this challenge, we made available a large dataset of 1200 annotated retinal fundus images from both non-glaucoma subjects (90%) and glaucoma patients (10%). In addition, an evaluation framework has been designed to allow all the submitted results to be evaluated and compared with one another in a uniform manner.
Age-related macular degeneration, abbreviated as AMD, is a degenerative disorder in the macular region. AMD is currently a leading cause of blindness in the world. With this challenge, we made available a large dataset of 1200 annotated retinal fundus images from both non-AMD subjects (~77%) and AMD patients (~23%). In addition, an evaluation framework has been designed to allow all the submitted results to be evaluated and compared with one another in a uniform manner.
Visual cognition is a key capability of human being, but a big challenge in artificial intelligence. We provide thousands of labeled video sequences and clips to help researchers train their models to "understand" what these videos represent.
Scene parsing is a core capability for autonomous driving technologies. We have collected and annotated a large amount of outdoor scenes captured by vehicle mounted sensors. The whole dataset will evolve to include RGB videos with per pixel annotation and high-accuracy depth, stereoscopic video, and panoramic images.
Machine Reading Comprehension
Machine Reading Comprehension (MRC) is one of the core abilities of artificial intelligence. We release DuReader 2.0, a large-scale real-world Chinese dataset for MRC to promote the research. DuReader 2.0 contains more than 300K questions, 1.4M evident documents and 660K human generated answers. Related competition.Related competition
Open-Domain Information Extraction (OIE) is a task of extracting important information from open-domain sentences. OIE are proven valuable in many artificial intelligence tasks such as text summarization, text comprehension, knowledge-based question answering systems, and more. We release SAOKE dataset, a human annotated dataset containing more than 40 thousand of Chinese sentences and the corresponding facts in SAOKE form.
DuIE Dataset offers a large number of real Chinese sentences with manually annotated and SPO triples. It provides a challenging benchmark for evaluating knowledge extraction algorithms bounded by a pre-defined schema.
Traffic Speed Prediction
We provide a large-scale real-world traffic speed prediction dataset - Q-Traffic dataset, which consists of 114 million crowd user queries, geographical attributes and traffic speed of 15,073 road segments.
Entity Recognition and Linking (ERL) is a fundamental task in the research and application of knowledge graph. It identifies entities in a given text and link them to the corresponding entries in a knowledge base. It is the building block for many intelligent systems such as search engine, question and answering system, recommendation system, dialog system. We are releasing the BERL dataset, a large-scale corpus of Chinese short-texts for entity recognition and linking tasks. BERL contains 100K annotated short text, and corresponding mention and links to entities in Baidu Knowledge Base.
We introduce a large-scale dataset of dog species for fine-grained classification tasks, which consists of 300,000 manually-annotated images of 362 dog categories. Being an important animal that is indispensable in our daily life, dog has a natural body configuration for understanding visual attentions. This dataset is hence useful to the developments of our FGVC community.
Fine-grained 3D Pose
In this Fine-Grained 3D Pose Dataset, we augment three existing fine-grained object datasets, i.e., StanfordCars, FGVC-Aircraft and CompCars, with 3D annotations. For each image in these datasets, we annotate the following two things: its corresponding 3D model and 3D pose.