Resources and Evaluations of Automated Chinese Error Diagnosis for Language Learners

Lung Hao Lee, Yuen Hsien Tseng*, Li Ping Chang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Chinese as a foreign language (CFL) learners may, in their language production, generate inappropriate linguistic usages, including character-level confusions (or commonly known as spelling errors) and word-/sentence-/discourse-level grammatical errors. Chinese spelling errors frequently arise from confusions among multiple-character words that are phonologically and visually similar but semantically distinct. Chinese grammatical errors contain coarse-grained surface differences in terms of missing, redundant, incorrect selection, and word ordering error of linguistic components. Simultaneously, fine-grained error types further focus on representing linguistic morphology and syntax such as verb, noun, preposition, conjunction, adverb, and so on. Annotated learner corpora are important language resources to understand these error patterns and to help the development of error diagnosis systems. In this chapter, we describe two representative Chinese learner corpora: the HSK Dynamic Composition Corpus constructed by Beijing Language and Culture University and the TOCFL Learner Corpus built by National Taiwan Normal University. In addition, we introduce several evaluations based on both learner corpora designed for computer-assisted Chinese learning. One is a series of SIGHAN bakeoffs for Chinese spelling checkers. The other series are the NLPTEA workshop shared tasks for Chinese grammatical error identification. The purpose of this chapter is to summarize the resources and evaluations for better understanding the current research developments and challenges of automated Chinese error diagnosis for CFL learners.

Original languageEnglish
Title of host publicationChinese Language Learning Sciences
PublisherSpringer Nature
Pages235-252
Number of pages18
DOIs
Publication statusPublished - 2019

Publication series

NameChinese Language Learning Sciences
ISSN (Print)2520-1719
ISSN (Electronic)2520-1727

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Education
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Resources and Evaluations of Automated Chinese Error Diagnosis for Language Learners'. Together they form a unique fingerprint.

Cite this