TY - GEN
T1 - NAaLOSS
T2 - 33rd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2023
AU - Ho, Kuan Hsun
AU - Yu, En Lun
AU - Hung, Jeih Weih
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates 'processing artifacts' that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers the loss of estimation, de-artifact, and noise ignorance, enabling the learned SE to individually model speech, artifacts, and noise. We examine two SE models (simple/advanced) learned with NAaLoss under various input scenarios (clean/noisy) using two configurations of the ASR system (with/without noise robustness). Experiments reveal that NAaLoss significantly improves the ASR performance of most setups while preserving the quality of SE toward perception and intelligibility. Furthermore, we visualize artifacts through waveforms and spectrograms, and explain their impact on ASR.
AB - Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates 'processing artifacts' that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers the loss of estimation, de-artifact, and noise ignorance, enabling the learned SE to individually model speech, artifacts, and noise. We examine two SE models (simple/advanced) learned with NAaLoss under various input scenarios (clean/noisy) using two configurations of the ASR system (with/without noise robustness). Experiments reveal that NAaLoss significantly improves the ASR performance of most setups while preserving the quality of SE toward perception and intelligibility. Furthermore, we visualize artifacts through waveforms and spectrograms, and explain their impact on ASR.
KW - noise-robust speech enhancement
KW - processing artifacts
KW - single-channel speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85177194419&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85177194419&partnerID=8YFLogxK
U2 - 10.1109/MLSP55844.2023.10285948
DO - 10.1109/MLSP55844.2023.10285948
M3 - Conference contribution
AN - SCOPUS:85177194419
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - Proceedings of the 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing, MLSP 2023
A2 - Comminiello, Danilo
A2 - Scarpiniti, Michele
PB - IEEE Computer Society
Y2 - 17 September 2023 through 20 September 2023
ER -