MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese

Yu Hsiang Tseng, Shu Kai Hsieh, Richard Lian, Chiung Yu Chiang, Yu Lin Chang, Li Ping Chang, Ji Lung Hsieh

研究成果: 書貢獻/報告類型會議論文篇章

摘要

The modern conversational agent requires high-quality datasets, which are often the bottlenecks when building models. This paper introduces MatDC, an entirely human-produced dialogue dataset with full semantic annotations in Chinese. The dataset features linguistic variations given users' intents and fully annotated semantic slots. MatDC dataset was completely human-edited, and the curation comprises two stages. At first, templates design stage, domain editors first construct schemas and compose ten dialogues between the agents and the users based on the back-end database. Secondly, in the dialogue rewrite stage, rewriters generate sentential variations for each template, under the constraints that the normalized slot values are kept unchanged. The underlying methodology of the MatDC is more open to extension and more adaptable to different domains. To demonstrate the applicability of the dataset, we build a dialogue agent with conventional pipeline architecture. We expect the MatDC dataset to provide additional training data and testing ground for dialogue agent studies.

原文英語
主出版物標題Proceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020
發行者Institute of Electrical and Electronics Engineers Inc.
頁面165-170
頁數6
ISBN(電子)9781665403801
DOIs
出版狀態已發佈 - 2020 12月
事件25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020 - Taipei, 臺灣
持續時間: 2020 12月 32020 12月 5

出版系列

名字Proceedings - 25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020

會議

會議25th International Conference on Technologies and Applications of Artificial Intelligence, TAAI 2020
國家/地區臺灣
城市Taipei
期間2020/12/032020/12/05

ASJC Scopus subject areas

  • 人工智慧
  • 電腦科學應用

指紋

深入研究「MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese」主題。共同形成了獨特的指紋。

引用此