Maximum F1-Score Training for End-to-End Mispronunciation Detection and Diagnosis of L2 English Speech

Bi Cheng Yan, Hsin Wei Wang, Shao Wei Fan Jiang, Fu An Chao, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method.

Original languageEnglish
Title of host publicationICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9781665485630
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Conference on Multimedia and Expo, ICME 2022 - Taipei, Taiwan
Duration: 2022 Jul 182022 Jul 22

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2022-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Country/TerritoryTaiwan
CityTaipei
Period2022/07/182022/07/22

Keywords

  • computer-assisted pronunciation training (CAPT)
  • end-to-end model
  • maximum F1-score training
  • mispronunciation detection and diagnosis (MDD)

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Maximum F1-Score Training for End-to-End Mispronunciation Detection and Diagnosis of L2 English Speech'. Together they form a unique fingerprint.

Cite this