基於多視角注意力機制語音增強模型於強健性自動語音辨識

Fu An Chao, Jeih Weih Hung, Berlin Chen

研究成果: 書貢獻/報告類型會議論文篇章

摘要

Recently, many studies have found that phase information is crucial in Speech Enhancement (SE), and time-domain single-channel speech enhancement techniques have been proved effective on noise suppression and robust Automatic Speech Recognition (ASR). Inspired by this, this research investigates two recently proposed SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we propose a novel multi-view attention-based speech enhancement model, which can harness the synergistic power of the aforementioned time-domain and frequency-domain SE methods and can be applied equally well to robust ASR. To evaluate the effectiveness of our proposed method, we use various noise datasets to create some synthetic test data and conduct extensive experiments on the Aishell-1 Mandarin speech corpus. The evaluation results show that our proposed method is superior to some current state-of-the-art time-domain and frequency-domain SE methods. Specifically, compared with the time-domain method, our method achieves 3.4%, 2.5% and 1.6% in relative character error rate (CER) reduction at three signal-to-noise ratios (SNRs), -5 dB, 5 dB and 15 dB, respectively, for the test set of pre-known noise scenarios, while the corresponding CER reductions for the test set of unknown noise scenarios are 3.8%, 4.8% and 2.2%, respectively.

貢獻的翻譯標題Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition
原文繁體中文
主出版物標題ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
編輯Jenq-Haur Wang, Ying-Hui Lai, Lung-Hao Lee, Kuan-Yu Chen, Hung-Yi Lee, Chi-Chun Lee, Syu-Siang Wang, Hen-Hsen Huang, Chuan-Ming Liu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面120-135
頁數16
ISBN(電子)9789869576932
出版狀態已發佈 - 2020
事件32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020 - Taipei, 臺灣
持續時間: 2020 9月 242020 9月 26

出版系列

名字ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing

會議

會議32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020
國家/地區臺灣
城市Taipei
期間2020/09/242020/09/26

Keywords

  • Acoustic Models
  • Automatic Speech Recognition
  • Deep Learning
  • Re-training
  • Single-Channel Speech Enhancement
  • Speech Enhancement

ASJC Scopus subject areas

  • 語言與語言學
  • 言語和聽力

指紋

深入研究「基於多視角注意力機制語音增強模型於強健性自動語音辨識」主題。共同形成了獨特的指紋。

引用此