Time-Reversal Enhancement Network With Cross-Domain Information for Noise-Robust Speech Recognition

Fu An Chao, Jeih Weih Hung, Tommy Sheu, Berlin Chen

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Due to the enormous progress in deep learning, speech enhancement (SE) techniques have shown promising efficacy and play a pivotal role prior to an automatic speech recognition (ASR) system to mitigate the noise effects. In this article, we put forward a novel cross-domain time-reversal enhancement network (CD-TENET). CD-TENET leverages the time-reversed version of a speech signal and two effective features that consider the phase information of a speech signal in the time domain and the frequency domain, respectively, to promote SE performance for noise-robust ASR. Extensive experiments demonstrate that CD-TENET can not only recover the original speech effectively but also improve both SE and ASR performance simultaneously. More surprisingly, the proposed CD-TENET method can offer a marked relative word error rate reduction on test utterances of scenarios contaminated with unseen noises when compared to a strong baseline with the multicondition training setting.

Original languageEnglish
Pages (from-to)114-124
Number of pages11
JournalIEEE Multimedia
Volume29
Issue number1
DOIs
Publication statusPublished - 2022

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Media Technology
  • Hardware and Architecture
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Time-Reversal Enhancement Network With Cross-Domain Information for Noise-Robust Speech Recognition'. Together they form a unique fingerprint.

Cite this