Developing learner corpus annotation for Chinese grammatical errors

Lung Hao Lee, Li Ping Chang, Yuen Hsien Tseng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

This study describes the construction of the TOCFL (Test Of Chinese as a Foreign Language) learner corpus, including the collection and grammatical error annotation of 2,837 essays written by Chinese language learners originating from a total of 46 different mother-Tongue languages. We propose hierarchical tagging sets to manually annotate grammatical errors, resulting in 33,835 inappropriate usages. Our built corpus has been provided for the shared tasks on Chinese grammatical error diagnosis. These demonstrate the usability of our learner corpus annotation.

Original languageEnglish
Title of host publicationProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016
EditorsMinghui Dong, Chung-Hsien Wu, Yanfeng Lu, Haizhou Li, Yuen-Hsien Tseng, Liang-Chih Yu, Lung-Hao Lee
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages254-257
Number of pages4
ISBN (Electronic)9781509009213
DOIs
Publication statusPublished - 2017 Mar 10
Event20th International Conference on Asian Language Processing, IALP 2016 - Tainan, Taiwan
Duration: 2016 Nov 212016 Nov 23

Publication series

NameProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016

Other

Other20th International Conference on Asian Language Processing, IALP 2016
Country/TerritoryTaiwan
CityTainan
Period2016/11/212016/11/23

Keywords

  • computer-Assisted language learning
  • error schema
  • error tagging
  • grammatical error diagnosis
  • interlanguage
  • second language acquisition

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Developing learner corpus annotation for Chinese grammatical errors'. Together they form a unique fingerprint.

Cite this