TY - GEN
T1 - Maximum F1-Score Training for End-to-End Mispronunciation Detection and Diagnosis of L2 English Speech
AU - Yan, Bi Cheng
AU - Wang, Hsin Wei
AU - Jiang, Shao Wei Fan
AU - Chao, Fu An
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method.
AB - End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data. However, there is a discrepancy between the objectives of model training and the MDD evaluation, since the performance of an MDD model is commonly evaluated in terms of F1-score instead of phone or word error rate (PER/WER). In view of this, we in this paper explore the use of a discriminative objective function for training E2E MDD models, which aims to maximize the expected F1-score directly. A series of experiments conducted on the L2-ARCTIC dataset show that our proposed method can yield considerable performance improvements in relation to some state-of-the-art E2E MDD approaches and the celebrated GOP method.
KW - computer-assisted pronunciation training (CAPT)
KW - end-to-end model
KW - maximum F1-score training
KW - mispronunciation detection and diagnosis (MDD)
UR - http://www.scopus.com/inward/record.url?scp=85137703020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137703020&partnerID=8YFLogxK
U2 - 10.1109/ICME52920.2022.9858931
DO - 10.1109/ICME52920.2022.9858931
M3 - Conference contribution
AN - SCOPUS:85137703020
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - ICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PB - IEEE Computer Society
T2 - 2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Y2 - 18 July 2022 through 22 July 2022
ER -