基於多視角注意力機制語音增強模型於強健性自動語音辨識

Translated title of the contribution: Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition

Fu An Chao, Jeih Weih Hung, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Recently, many studies have found that phase information is crucial in Speech Enhancement (SE), and time-domain single-channel speech enhancement techniques have been proved effective on noise suppression and robust Automatic Speech Recognition (ASR). Inspired by this, this research investigates two recently proposed SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we propose a novel multi-view attention-based speech enhancement model, which can harness the synergistic power of the aforementioned time-domain and frequency-domain SE methods and can be applied equally well to robust ASR. To evaluate the effectiveness of our proposed method, we use various noise datasets to create some synthetic test data and conduct extensive experiments on the Aishell-1 Mandarin speech corpus. The evaluation results show that our proposed method is superior to some current state-of-the-art time-domain and frequency-domain SE methods. Specifically, compared with the time-domain method, our method achieves 3.4%, 2.5% and 1.6% in relative character error rate (CER) reduction at three signal-to-noise ratios (SNRs), -5 dB, 5 dB and 15 dB, respectively, for the test set of pre-known noise scenarios, while the corresponding CER reductions for the test set of unknown noise scenarios are 3.8%, 4.8% and 2.2%, respectively.

Translated title of the contributionMulti-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition
Original languageChinese (Traditional)
Title of host publicationROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
EditorsJenq-Haur Wang, Ying-Hui Lai, Lung-Hao Lee, Kuan-Yu Chen, Hung-Yi Lee, Chi-Chun Lee, Syu-Siang Wang, Hen-Hsen Huang, Chuan-Ming Liu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages120-135
Number of pages16
ISBN (Electronic)9789869576932
Publication statusPublished - 2020
Event32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020 - Taipei, Taiwan
Duration: 2020 Sept 242020 Sept 26

Publication series

NameROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing

Conference

Conference32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020
Country/TerritoryTaiwan
CityTaipei
Period2020/09/242020/09/26

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'Multi-view Attention-based Speech Enhancement Model for Noise-robust Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this