Exploring the Integration of E2E ASR and Pronunciation Modeling for English Mispronunciation Detection

Hsin Wei Wang, Bi Cheng Yan, Yung Chang Hsu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There has been increasing demand to develop effective computer-assisted language training (CAPT) systems, which can provide feedback on mispronunciations and facilitate second-language (L2) learners to improve their speaking proficiency through repeated practice. Due to the shortage of non-native speech for training the automatic speech recognition (ASR) module of a CAPT system, the corresponding mispronunciation detection performance is often affected by imperfect ASR. Recognizing this importance, we in this paper put forward a two-stage mispronunciation detection method. In the first stage, the speech uttered by an L2 learner is processed by an end-to-end ASR module to produce N-best phone sequence hypotheses. In the second stage, these hypotheses are fed into a pronunciation model which seeks to faithfully predict the phone sequence hypothesis that is most likely pronounced by the learner, so as to improve the performance of mispronunciation detection. Empirical experiments conducted a English benchmark dataset seem to confirm the utility of our method.

Original languageEnglish
Title of host publicationROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing
EditorsLung-Hao Lee, Chia-Hui Chang, Kuan-Yu Chen
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages124-131
Number of pages8
ISBN (Electronic)9789869576949
Publication statusPublished - 2021
Event33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021 - Taoyuan, Taiwan
Duration: 2021 Oct 152021 Oct 16

Publication series

NameROCLING 2021 - Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing

Conference

Conference33rd Conference on Computational Linguistics and Speech Processing, ROCLING 2021
Country/TerritoryTaiwan
CityTaoyuan
Period2021/10/152021/10/16

Keywords

  • End-to-End Speech Recognition
  • Mispronunciation Detection and Diagnosis
  • N-best Rescoring

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Exploring the Integration of E2E ASR and Pronunciation Modeling for English Mispronunciation Detection'. Together they form a unique fingerprint.

Cite this