TY - GEN
T1 - 基於多視角注意力機制語音增強模型於強健性自動語音辨識
AU - Chao, Fu An
AU - Hung, Jeih Weih
AU - Chen, Berlin
N1 - Publisher Copyright:
© ROCLING 2020.All rights reserved.
PY - 2020
Y1 - 2020
N2 - Recently, many studies have found that phase information is crucial in Speech Enhancement (SE), and time-domain single-channel speech enhancement techniques have been proved effective on noise suppression and robust Automatic Speech Recognition (ASR). Inspired by this, this research investigates two recently proposed SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we propose a novel multi-view attention-based speech enhancement model, which can harness the synergistic power of the aforementioned time-domain and frequency-domain SE methods and can be applied equally well to robust ASR. To evaluate the effectiveness of our proposed method, we use various noise datasets to create some synthetic test data and conduct extensive experiments on the Aishell-1 Mandarin speech corpus. The evaluation results show that our proposed method is superior to some current state-of-the-art time-domain and frequency-domain SE methods. Specifically, compared with the time-domain method, our method achieves 3.4%, 2.5% and 1.6% in relative character error rate (CER) reduction at three signal-to-noise ratios (SNRs), -5 dB, 5 dB and 15 dB, respectively, for the test set of pre-known noise scenarios, while the corresponding CER reductions for the test set of unknown noise scenarios are 3.8%, 4.8% and 2.2%, respectively.
AB - Recently, many studies have found that phase information is crucial in Speech Enhancement (SE), and time-domain single-channel speech enhancement techniques have been proved effective on noise suppression and robust Automatic Speech Recognition (ASR). Inspired by this, this research investigates two recently proposed SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we propose a novel multi-view attention-based speech enhancement model, which can harness the synergistic power of the aforementioned time-domain and frequency-domain SE methods and can be applied equally well to robust ASR. To evaluate the effectiveness of our proposed method, we use various noise datasets to create some synthetic test data and conduct extensive experiments on the Aishell-1 Mandarin speech corpus. The evaluation results show that our proposed method is superior to some current state-of-the-art time-domain and frequency-domain SE methods. Specifically, compared with the time-domain method, our method achieves 3.4%, 2.5% and 1.6% in relative character error rate (CER) reduction at three signal-to-noise ratios (SNRs), -5 dB, 5 dB and 15 dB, respectively, for the test set of pre-known noise scenarios, while the corresponding CER reductions for the test set of unknown noise scenarios are 3.8%, 4.8% and 2.2%, respectively.
KW - Acoustic Models
KW - Automatic Speech Recognition
KW - Deep Learning
KW - Re-training
KW - Single-Channel Speech Enhancement
KW - Speech Enhancement
UR - http://www.scopus.com/inward/record.url?scp=85181106449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85181106449&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:85181106449
T3 - ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
SP - 120
EP - 135
BT - ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
A2 - Wang, Jenq-Haur
A2 - Lai, Ying-Hui
A2 - Lee, Lung-Hao
A2 - Chen, Kuan-Yu
A2 - Lee, Hung-Yi
A2 - Lee, Chi-Chun
A2 - Wang, Syu-Siang
A2 - Huang, Hen-Hsen
A2 - Liu, Chuan-Ming
PB - The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
T2 - 32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020
Y2 - 24 September 2020 through 26 September 2020
ER -