TY - GEN
T1 - CROSS-DOMAIN SINGLE-CHANNEL SPEECH ENHANCEMENT MODEL WITH BI-PROJECTION FUSION MODULE FOR NOISE-ROBUST ASR
AU - Chao, Fu An
AU - Hung, Jeih Weih
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line time-domain and frequency-domain SE methods in both enhancement and ASR evaluation metrics for the test set of scenarios contaminated with seen and unseen noise, respectively.
AB - In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line time-domain and frequency-domain SE methods in both enhancement and ASR evaluation metrics for the test set of scenarios contaminated with seen and unseen noise, respectively.
KW - Automatic Speech Recognition
KW - Single-Channel Speech Enhancement
KW - Speech Enhancement
KW - Time and Frequency Domains
UR - http://www.scopus.com/inward/record.url?scp=85119122666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119122666&partnerID=8YFLogxK
U2 - 10.1109/ICME51207.2021.9428157
DO - 10.1109/ICME51207.2021.9428157
M3 - Conference contribution
AN - SCOPUS:85119122666
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
PB - IEEE Computer Society
T2 - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Y2 - 5 July 2021 through 9 July 2021
ER -