TY - JOUR
T1 - Effective Graph-Based Modeling of Articulation Traits for Mispronunciation Detection and Diagnosis
AU - Yan, Bi Cheng
AU - Wang, Hsin Wei
AU - Wang, Yi Cheng
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Mispronunciation detection and diagnosis (MDD) manages to pinpoint phone-level erroneous pronunciation segmentations and provide instant and informative diagnostic feedback to L2 (second-language) learners. Among the various modeling paradigms for MDD, dictation-based neural methods have recently become a de facto standard, which identifies pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. Despite their decent efficacy, dictation-based methods have at least two downsides. First, the dictation process and alignment process are made independent of each other, often resulting in a poor diagnostic feedback. Second, prior knowledge about the articulation traits of the canonical phones in the text prompt is not fully utilized in MDD. On account of this, we propose a novel end-to-end MDD method that can streamline the dictation process and the alignment process in a non-autoregressive manner. In addition, knowledge about phone-level articulation traits are extracted with a graph convolutional network (GCN) to obtain more discriminative phonetic embeddings so as to promote the MDD performance. An extensive set of experiments conducted on the L2-ARCTIC benchmark dataset suggest the feasibility and effectiveness of our approach in relation to competitive baselines.
AB - Mispronunciation detection and diagnosis (MDD) manages to pinpoint phone-level erroneous pronunciation segmentations and provide instant and informative diagnostic feedback to L2 (second-language) learners. Among the various modeling paradigms for MDD, dictation-based neural methods have recently become a de facto standard, which identifies pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. Despite their decent efficacy, dictation-based methods have at least two downsides. First, the dictation process and alignment process are made independent of each other, often resulting in a poor diagnostic feedback. Second, prior knowledge about the articulation traits of the canonical phones in the text prompt is not fully utilized in MDD. On account of this, we propose a novel end-to-end MDD method that can streamline the dictation process and the alignment process in a non-autoregressive manner. In addition, knowledge about phone-level articulation traits are extracted with a graph convolutional network (GCN) to obtain more discriminative phonetic embeddings so as to promote the MDD performance. An extensive set of experiments conducted on the L2-ARCTIC benchmark dataset suggest the feasibility and effectiveness of our approach in relation to competitive baselines.
KW - L2-ARCTIC
KW - articulatory manner
KW - computer-assisted pronunciation training (CAPT)
KW - graph convolutional network (GCN)
KW - mispronunciation detection and diagnosis (MDD)
UR - http://www.scopus.com/inward/record.url?scp=85174884209&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174884209&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10097226
DO - 10.1109/ICASSP49357.2023.10097226
M3 - Conference article
AN - SCOPUS:85174884209
SN - 1520-6149
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -