TY - GEN
T1 - JAM
T2 - 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
AU - He, Yue Yang
AU - Yan, Bi Cheng
AU - Lo, Tien Hong
AU - Lin, Meng Shin
AU - Hsu, Yung Chang
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Computer-assisted pronunciation training (CAPT) systems are designed for second-language (L2) learners to practice their pronunciation skills by offering objective, personalized feedback in a stress-free, self-directed learning scenario. Both mispronunciation detection and diagnosis (MDD) and automatic pronunciation assessment (APA) are indispensable components in the CAPT systems. The former is responsible for pinpointing phonetic pronunciation errors and providing the corresponding diagnostic feedback, while the latter manages to evaluate oral skills across various linguistic levels with disparate aspects. Most existing efforts typically treat APA and MDD as independent tasks, where the correlations between the assessment scores and the phonetic pronunciation errors are nearly sidelined. In light of this, we introduce JAM (a Joint neural model for APA and MDD), a novel end-to-end neural model for CAPT that streamlines the components of APA and MDD into a unified structure with a parallel pronunciation modeling architecture. To capture fine-grained pronunciation cues from L2 learners' speech, electromagnetic articulography (EMA) features are introduced for the proposed model, which portrays the movement of articulatory structures, such as the jaw, lips, and tongue. A series of experiments conducted on the speechocean762 benchmark dataset demonstrate the feasibility and effectiveness of our approach compared to several competitive baselines. Additionally, an ablation study is presented to assess the contributions of different input features and training strategies in the proposed model.
AB - Computer-assisted pronunciation training (CAPT) systems are designed for second-language (L2) learners to practice their pronunciation skills by offering objective, personalized feedback in a stress-free, self-directed learning scenario. Both mispronunciation detection and diagnosis (MDD) and automatic pronunciation assessment (APA) are indispensable components in the CAPT systems. The former is responsible for pinpointing phonetic pronunciation errors and providing the corresponding diagnostic feedback, while the latter manages to evaluate oral skills across various linguistic levels with disparate aspects. Most existing efforts typically treat APA and MDD as independent tasks, where the correlations between the assessment scores and the phonetic pronunciation errors are nearly sidelined. In light of this, we introduce JAM (a Joint neural model for APA and MDD), a novel end-to-end neural model for CAPT that streamlines the components of APA and MDD into a unified structure with a parallel pronunciation modeling architecture. To capture fine-grained pronunciation cues from L2 learners' speech, electromagnetic articulography (EMA) features are introduced for the proposed model, which portrays the movement of articulatory structures, such as the jaw, lips, and tongue. A series of experiments conducted on the speechocean762 benchmark dataset demonstrate the feasibility and effectiveness of our approach compared to several competitive baselines. Additionally, an ablation study is presented to assess the contributions of different input features and training strategies in the proposed model.
UR - http://www.scopus.com/inward/record.url?scp=85218201225&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218201225&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC63619.2025.10849265
DO - 10.1109/APSIPAASC63619.2025.10849265
M3 - Conference contribution
AN - SCOPUS:85218201225
T3 - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
BT - APSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 December 2024 through 6 December 2024
ER -