Peppanet: Effective Mispronunciation Detection and Diagnosis Leveraging Phonetic, Phonological, and Acoustic Cues

Bi Cheng Yan, Hsin Wei Wang, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Mispronunciation detection and diagnosis (MDD) aims to detect erroneous pronunciation segments in an L2 learner's articulation and subsequently provide informative diagnostic feedback. Most existing neural methods follow a dictation-based modeling paradigm that finds out pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. However, the main downside of these methods is that the dictation process and alignment process are mostly made independent of each other. In view of this, we present a novel end-to-end neural method, dubbed PeppaNet, building on a unified structure that can jointly model the dictation process and the alignment process. The model of our method learns to directly predict the pronunciation correctness of each canonical phone of the text prompt and in turn provides its corresponding diagnostic feedback. In contrast to the conventional dictation-based methods that rely mainly on a free-phone recognition process, PeppaNet makes good use of an effective selective gating mechanism to simultaneously incorporate phonetic, phonological and acoustic cues to generate corrections that are more proper and phonetically related to the canonical pronunciations. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset seem to show the merits of our proposed method in comparison to some recent top-of-the-line methods.

Original languageEnglish
Title of host publication2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1045-1051
Number of pages7
ISBN (Electronic)9798350396904
DOIs
Publication statusPublished - 2023
Event2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, Qatar
Duration: 2023 Jan 92023 Jan 12

Publication series

Name2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

Conference2022 IEEE Spoken Language Technology Workshop, SLT 2022
Country/TerritoryQatar
CityDoha
Period2023/01/092023/01/12

Keywords

  • Computer-assisted pronunciation training (CAPT)
  • L2-ARCTIC
  • dictation model
  • mispronunciation detection and diagnosis (MDD)
  • text prompt

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Peppanet: Effective Mispronunciation Detection and Diagnosis Leveraging Phonetic, Phonological, and Acoustic Cues'. Together they form a unique fingerprint.

Cite this