TY - GEN
T1 - Peppanet
T2 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022
AU - Yan, Bi Cheng
AU - Wang, Hsin Wei
AU - Chen, Berlin
N1 - Funding Information:
This work was supported in part by 1E.SUN Financial Holding CO., LTD., Taiwan, under grant numbers 20210801-ntu-01 and 202208-NTU-02. Any findings and implications in the paper do not necessarily reflect those of the sponsors.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Mispronunciation detection and diagnosis (MDD) aims to detect erroneous pronunciation segments in an L2 learner's articulation and subsequently provide informative diagnostic feedback. Most existing neural methods follow a dictation-based modeling paradigm that finds out pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. However, the main downside of these methods is that the dictation process and alignment process are mostly made independent of each other. In view of this, we present a novel end-to-end neural method, dubbed PeppaNet, building on a unified structure that can jointly model the dictation process and the alignment process. The model of our method learns to directly predict the pronunciation correctness of each canonical phone of the text prompt and in turn provides its corresponding diagnostic feedback. In contrast to the conventional dictation-based methods that rely mainly on a free-phone recognition process, PeppaNet makes good use of an effective selective gating mechanism to simultaneously incorporate phonetic, phonological and acoustic cues to generate corrections that are more proper and phonetically related to the canonical pronunciations. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset seem to show the merits of our proposed method in comparison to some recent top-of-the-line methods.
AB - Mispronunciation detection and diagnosis (MDD) aims to detect erroneous pronunciation segments in an L2 learner's articulation and subsequently provide informative diagnostic feedback. Most existing neural methods follow a dictation-based modeling paradigm that finds out pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. However, the main downside of these methods is that the dictation process and alignment process are mostly made independent of each other. In view of this, we present a novel end-to-end neural method, dubbed PeppaNet, building on a unified structure that can jointly model the dictation process and the alignment process. The model of our method learns to directly predict the pronunciation correctness of each canonical phone of the text prompt and in turn provides its corresponding diagnostic feedback. In contrast to the conventional dictation-based methods that rely mainly on a free-phone recognition process, PeppaNet makes good use of an effective selective gating mechanism to simultaneously incorporate phonetic, phonological and acoustic cues to generate corrections that are more proper and phonetically related to the canonical pronunciations. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset seem to show the merits of our proposed method in comparison to some recent top-of-the-line methods.
KW - Computer-assisted pronunciation training (CAPT)
KW - dictation model
KW - L2-ARCTIC
KW - mispronunciation detection and diagnosis (MDD)
KW - text prompt
UR - http://www.scopus.com/inward/record.url?scp=85147800134&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147800134&partnerID=8YFLogxK
U2 - 10.1109/SLT54892.2023.10022472
DO - 10.1109/SLT54892.2023.10022472
M3 - Conference contribution
AN - SCOPUS:85147800134
T3 - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
SP - 1045
EP - 1051
BT - 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 January 2023 through 12 January 2023
ER -