MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese

Yu Hsiang Tseng, Shu Kai Hsieh, Richard Lian, Chiung Yu Chiang, Yu Lin Chang, Li Ping Chang, Ji Lung Hsieh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The modern conversational agent requires high-quality datasets, which are often the bottlenecks when building models. This paper introduces MatDC, an entirely human-produced dialogue dataset with full semantic annotations in Chinese. The dataset features linguistic variations given users' intents and fully annotated semantic slots. MatDC dataset was completely human-edited, and the curation comprises two stages. At first, templates design stage, domain editors first construct schemas and compose ten dialogues between the agents and the users based on the back-end database. Secondly, in the dialogue rewrite stage, rewriters generate sentential variations for each template, under the constraints that the normalized slot values are kept unchanged. The underlying methodology of the MatDC is more open to extension and more adaptable to different domains. To demonstrate the applicability of the dataset, we build a dialogue agent with conventional pipeline architecture. We expect the MatDC dataset to provide additional training data and testing ground for dialogue agent studies.

Original languageEnglish
Title of host publicationProceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages165-170
Number of pages6
ISBN (Electronic)9781665403801
DOIs
Publication statusPublished - 2020 Dec
Externally publishedYes
Event25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020 - Taipei, Taiwan
Duration: 2020 Dec 32020 Dec 5

Publication series

NameProceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020

Conference

Conference25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020
Country/TerritoryTaiwan
CityTaipei
Period2020/12/032020/12/05

Keywords

  • conversational agent
  • language resources
  • semantic annotations
  • task-oriented dialogue dataset

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese'. Together they form a unique fingerprint.

Cite this