Investigating Low-Distortion Speech Enhancement with Discrete Cosine Transform Features for Robust Speech Recognition

Yu Sheng Tsao*, Jeih Weih Hung, Kuan Hsun Ho*, Berlin Chen*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This study investigates constructing low-distortion utterances to benefit downstream automatic speech recognition (ASR) systems at the front-end stage based on a speech enhancement (SE) network. With the dual-path Transformer network (DPTNet) as the SE archetype, we make effective use of short-time discrete cosine transform (STDCT) features to infer the respective mask-estimation network. Furthermore, we seek to jointly optimize the spectral-distance loss and the perceptual loss for the training of the model components of our proposed SE model so as to enhance the input utterances without introducing significant distortion. Extensive evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks, containing stationary and non-stationary noises, respectively. The corresponding results show that the proposed SE method yields competitive perceptual metric scores on SE but significantly lower word error rates (WER) on ASR in relation to several top-of-the-line methods. Notably, the proposed SE method works remarkably well on the VoiceBank-QUT ASR task, thereby confirming its excellent generalization capability to unseen scenarios.

Original languageEnglish
Title of host publicationProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages131-136
Number of pages6
ISBN (Electronic)9786165904773
DOIs
Publication statusPublished - 2022
Event2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, Thailand
Duration: 2022 Nov 72022 Nov 10

Publication series

NameProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

Conference

Conference2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Country/TerritoryThailand
CityChiang Mai
Period2022/11/072022/11/10

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Investigating Low-Distortion Speech Enhancement with Discrete Cosine Transform Features for Robust Speech Recognition'. Together they form a unique fingerprint.

Cite this