A Time-reversal Enhancement Network with Cross-domain Information for Noise-robust Speech Recognition

Fu An Chao, Jeih weih Hung, Tommy Sheu, Berlin Chen

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Due to the enormous progress in deep learning, speech enhancement (SE) techniques have shown promising efficacy and play a pivotal role prior to an automatic speech recognition (ASR) system to mitigate the noise effects. By virtue of a wide variety of deep neural network (DNN) approaches, some researches have suggested that the phase information of speech signals is the key to success on SE. In this study, we present a continuation of our previous work and put forward a novel Cross-domain Time-reversal Enhancement NETwork (CD-TENET). CD-TENET leverages the time-reversed version of a speech signal and two features that consider the phase information of a speech signal in the time domain and the frequency domain, respectively, to promote SE performance for noise-robust ASR. Extensive experiments conducted on the Voicebank-DEMAND benchmark dataset show that CD-TENET can not only recover the original speech effectively but also improve both SE and ASR performance simultaneously. More surprisingly, the proposed CD-TENET method can offer a marked relative word error rate (WER) reduction of 43 % on the test set of scenarios contaminated with unseen noises when compared to a strong baseline with the multi-condition training setting.

原文英語
期刊IEEE Multimedia
DOIs
出版狀態接受/付印 - 2021

ASJC Scopus subject areas

  • 軟體
  • 訊號處理
  • 媒體技術
  • 硬體和架構
  • 電腦科學應用

指紋

深入研究「A Time-reversal Enhancement Network with Cross-domain Information for Noise-robust Speech Recognition」主題。共同形成了獨特的指紋。

引用此