TY - GEN
T1 - TENET
T2 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021
AU - Chao, Fu An
AU - Jiang, Shao Wei Fan
AU - Yan, Bi Cheng
AU - Hung, Jeih Weih
AU - Chen, Berlin
N1 - Funding Information:
This research is supported in part by Ministry of Science and Technology, Taiwan under Grant Number MOST 110-2634-F-008-004-through Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan, and Grant Numbers MOST 108-2221-E-003-005-MY3 and MOST 109-2221-E-003-020-MY3. Any findings and implications in the paper do not necessarily reflect those of the sponsors.
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling so as to mitigate noise effects on speech. To increase the perceptual quality of speech, the current state-of-the-art in the realm of SE adopts adversarial training by connecting an objective metric to the discriminator. However, there is no guarantee that optimizing the perceptual quality of speech will necessarily lead to improved automatic speech recognition (ASR) performance. In this study, we present TENET††Inspired by the movie - TENET, Christopher Nolan, 2020., ∗∗Some of the enhanced audio samples can be found from https://fuann.github.io/TENET., a novel Time-reversal Enhancement NETwork, which leverages the transformation of an input noisy signal itself, i.e., the time-reversed version, in conjunction with a Siamese network and a complex dual-path Transformer to promote SE performance for noise-robust ASR. Extensive experiments conducted on the Voicebank-DEMAND dataset show that TENET can achieve stellar results compared to a few top-of-the-line methods in terms of both SE and ASR evaluation metrics. To demonstrate the model generalization ability, we further evaluate TENET on the test set of scenarios contaminated with unseen noise, and the results also confirm the superiority of this promising method.
AB - Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling so as to mitigate noise effects on speech. To increase the perceptual quality of speech, the current state-of-the-art in the realm of SE adopts adversarial training by connecting an objective metric to the discriminator. However, there is no guarantee that optimizing the perceptual quality of speech will necessarily lead to improved automatic speech recognition (ASR) performance. In this study, we present TENET††Inspired by the movie - TENET, Christopher Nolan, 2020., ∗∗Some of the enhanced audio samples can be found from https://fuann.github.io/TENET., a novel Time-reversal Enhancement NETwork, which leverages the transformation of an input noisy signal itself, i.e., the time-reversed version, in conjunction with a Siamese network and a complex dual-path Transformer to promote SE performance for noise-robust ASR. Extensive experiments conducted on the Voicebank-DEMAND dataset show that TENET can achieve stellar results compared to a few top-of-the-line methods in terms of both SE and ASR evaluation metrics. To demonstrate the model generalization ability, we further evaluate TENET on the test set of scenarios contaminated with unseen noise, and the results also confirm the superiority of this promising method.
KW - Automatic Speech Recognition
KW - Deep Learning
KW - Siamese Network
KW - Speech Enhancement
KW - Time Reversal
UR - http://www.scopus.com/inward/record.url?scp=85118751179&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118751179&partnerID=8YFLogxK
U2 - 10.1109/ASRU51503.2021.9687924
DO - 10.1109/ASRU51503.2021.9687924
M3 - Conference contribution
AN - SCOPUS:85118751179
T3 - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
SP - 55
EP - 61
BT - 2021 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 December 2021 through 17 December 2021
ER -