Visual cognition is a key capability of human being, but a big challenge in artificial intelligence. We provide thousands of labeled video sequences and clips to help researchers train their models to "understand" what these videos represent.
Scene parsing is a core capability for autonomous driving technologies. We have collected and annotated a large amount of outdoor scenes captured by vehicle mounted sensors. The whole dataset will evolve to include RGB videos with per pixel annotation and high-accuracy depth, stereoscopic video, and panoramic images.
Machine Reading Comprehension
Machine Reading Comprehension (MRC) is one of the core abilities of artificial intelligence. We release DuReader, a large-scale real-world Chinese dataset for MRC to promote the research. DuReader contains more than 200K questions, 1M evidence documents and 420K human generated answers.Related competition
Open-Domain Information Extraction (OIE) is a task of extracting important information from open-domain sentences. OIE are proven valuable in many artificial intelligence tasks such as text summarization, text comprehension, knowledge-based question answering systems, and more. We release SAOKE dataset, a human annotated dataset containing more than 40 thousand of Chinese sentences and the corresponding facts in SAOKE form.
Schema based Knowledge Extraction (SKE) Dataset offers a large number of real Chinese sentences with manually annotated and SPO triples. It provides a challenging benchmark for evaluating knowledge extraction algorithms bounded by a pre-defined schema.
Traffic Speed Prediction
We provide a large-scale real-world traffic speed prediction dataset - Q-Traffic dataset, which consists of 114 million crowd user queries, geographical attributes and traffic speed of 15,073 road segments.
Entity Recognition and Linking (ERL) is a fundamental task in the research and application of knowledge graph. It identifies entities in a given text and link them to the corresponding entries in a knowledge base. It is the building block for many intelligent systems such as search engine, question and answering system, recommendation system, dialog system. We are releasing the BERL dataset, a large-scale corpus of Chinese short-texts for entity recognition and linking tasks. BERL contains 100K annotated short text, and corresponding mention and links to entities in Baidu Knowledge Base.