TY - GEN
T1 - 探究文本提示於端對端發音訓練系統之應用
AU - Cheng, Yu Sen
AU - Lo, Tien Hong
AU - Chen, Berlin
N1 - Publisher Copyright:
© ROCLING 2020.All rights reserved.
PY - 2020
Y1 - 2020
N2 - More recently, there is a growing demand for the development of computer assisted pronunciation training (CAPT) systems, which can be capitalized to automatically assess the pronunciation quality of L2 learners. However, current CAPT systems that build on end-to-end (E2E) neural network architectures still fall short of expectation for the detection of mispronunciations. This is partly because most of their model components are simply designed and optimized for automatic speech recognition (ASR), but are not specifically tailored for CAPT. Unlike ASR that aims to recognize the utterance of a given speaker (even when poorly pronounced) as correctly as possible, CAPT manages to detect pronunciation errors as subtlety as possible. In view of this, we seek to develop an E2E neural CAPT method that makes use of two disparate encoders to generate embedding of an L2 speaker’s test utterance and the corresponding canonical pronunciations in the given text prompt, respectively. The outputs of the two encoders are fed into a decoder through a hierarchical attention mechanism (HAM), with the purpose to enable the decoder to focus more on detecting mispronunciations. A series of experiments conducted on an L2 Mandarin Chinese speech corpus have demonstrated the effectiveness of our method in terms of different evaluation metrics, when compared with some state-of-the-art E2E neural CAPT methods.
AB - More recently, there is a growing demand for the development of computer assisted pronunciation training (CAPT) systems, which can be capitalized to automatically assess the pronunciation quality of L2 learners. However, current CAPT systems that build on end-to-end (E2E) neural network architectures still fall short of expectation for the detection of mispronunciations. This is partly because most of their model components are simply designed and optimized for automatic speech recognition (ASR), but are not specifically tailored for CAPT. Unlike ASR that aims to recognize the utterance of a given speaker (even when poorly pronounced) as correctly as possible, CAPT manages to detect pronunciation errors as subtlety as possible. In view of this, we seek to develop an E2E neural CAPT method that makes use of two disparate encoders to generate embedding of an L2 speaker’s test utterance and the corresponding canonical pronunciations in the given text prompt, respectively. The outputs of the two encoders are fed into a decoder through a hierarchical attention mechanism (HAM), with the purpose to enable the decoder to focus more on detecting mispronunciations. A series of experiments conducted on an L2 Mandarin Chinese speech corpus have demonstrated the effectiveness of our method in terms of different evaluation metrics, when compared with some state-of-the-art E2E neural CAPT methods.
KW - Computer assisted pronunciation training
KW - end-to-end speech recognition
KW - hierarchical attention mechanism
KW - mispronunciation detection
KW - mispronunciation diagnosis
UR - http://www.scopus.com/inward/record.url?scp=85181115524&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181115524&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:85181115524
T3 - ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
SP - 290
EP - 303
BT - ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
A2 - Wang, Jenq-Haur
A2 - Lai, Ying-Hui
A2 - Lee, Lung-Hao
A2 - Chen, Kuan-Yu
A2 - Lee, Hung-Yi
A2 - Lee, Chi-Chun
A2 - Wang, Syu-Siang
A2 - Huang, Hen-Hsen
A2 - Liu, Chuan-Ming
PB - The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
T2 - 32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020
Y2 - 24 September 2020 through 26 September 2020
ER -