CROSS-DOMAIN SINGLE-CHANNEL SPEECH ENHANCEMENT MODEL WITH BI-PROJECTION FUSION MODULE FOR NOISE-ROBUST ASR

Fu An Chao, Jeih Weih Hung, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line time-domain and frequency-domain SE methods in both enhancement and ASR evaluation metrics for the test set of scenarios contaminated with seen and unseen noise, respectively.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Multimedia and Expo, ICME 2021
PublisherIEEE Computer Society
ISBN (Electronic)9781665438643
DOIs
Publication statusPublished - 2021
Event2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, China
Duration: 2021 Jul 52021 Jul 9

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Country/TerritoryChina
CityShenzhen
Period2021/07/052021/07/09

Keywords

  • Automatic Speech Recognition
  • Single-Channel Speech Enhancement
  • Speech Enhancement
  • Time and Frequency Domains

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'CROSS-DOMAIN SINGLE-CHANNEL SPEECH ENHANCEMENT MODEL WITH BI-PROJECTION FUSION MODULE FOR NOISE-ROBUST ASR'. Together they form a unique fingerprint.

Cite this