TY - GEN
T1 - 用於語音增強之偽影感知加權損失函數
AU - Yu, En Lun
AU - Ho, Kuan Hsun
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2023 ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing. All rights reserved.
PY - 2023
Y1 - 2023
N2 - The Speech Enhancement (SE) system not only enhances the perceptual quality of speech but also make the ASR performance robust in noisy enviornments when integrating with ASR systems. However, single-channel SE may generate detrimental artifacts to ASR recognition, leading to recognition errors. Recent research indicates that by introducing the novel SE loss function NAaLoss and fine-tuning the model, the generation of artifacts can be effectively reduced. Nonetheless, this approach still needs to be revised in its underlying assumptions. Therefore, we extensively analyze this method in this study and conduct numerous experiments and case studies to identify the inconsistencies. To address this, we propose an improved loss function, AaWLoss. AaWLoss successfully resolves the potential loss of noise-condition artifact suppression inherent in NAaLoss under the same settings through modifications and optimizations. Furthermore, AaWLoss achieves peak performance in suppressing artifacts under clean conditions, even adding information beneficial for ASR recognition to the enhanced clean speech.
AB - The Speech Enhancement (SE) system not only enhances the perceptual quality of speech but also make the ASR performance robust in noisy enviornments when integrating with ASR systems. However, single-channel SE may generate detrimental artifacts to ASR recognition, leading to recognition errors. Recent research indicates that by introducing the novel SE loss function NAaLoss and fine-tuning the model, the generation of artifacts can be effectively reduced. Nonetheless, this approach still needs to be revised in its underlying assumptions. Therefore, we extensively analyze this method in this study and conduct numerous experiments and case studies to identify the inconsistencies. To address this, we propose an improved loss function, AaWLoss. AaWLoss successfully resolves the potential loss of noise-condition artifact suppression inherent in NAaLoss under the same settings through modifications and optimizations. Furthermore, AaWLoss achieves peak performance in suppressing artifacts under clean conditions, even adding information beneficial for ASR recognition to the enhanced clean speech.
KW - noise-robust speech Recognition
KW - processing artifacts
KW - single-channel speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85184840278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184840278&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:85184840278
T3 - ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing
SP - 71
EP - 78
BT - ROCLING 2023 - Proceedings of the 35th Conference on Computational Linguistics and Speech Processing
A2 - Wu, Jheng-Long
A2 - Su, Ming-Hsiang
A2 - Huang, Hen-Hsen
A2 - Tsao, Yu
A2 - Tseng, Hou-Chiang
A2 - Chang, Chia-Hui
A2 - Lee, Lung-Hao
A2 - Liao, Yuan-Fu
A2 - Ma, Wei-Yun
PB - The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
T2 - 35th Conference on Computational Linguistics and Speech Processing, ROCLING 2023
Y2 - 20 October 2023 through 21 October 2023
ER -