Investigating Low-Distortion Speech Enhancement with Discrete Cosine Transform Features for Robust Speech Recognition

Yu Sheng Tsao*, Jeih Weih Hung, Kuan Hsun Ho*, Berlin Chen*

*此作品的通信作者

研究成果: 書貢獻/報告類型會議論文篇章

摘要

This study investigates constructing low-distortion utterances to benefit downstream automatic speech recognition (ASR) systems at the front-end stage based on a speech enhancement (SE) network. With the dual-path Transformer network (DPTNet) as the SE archetype, we make effective use of short-time discrete cosine transform (STDCT) features to infer the respective mask-estimation network. Furthermore, we seek to jointly optimize the spectral-distance loss and the perceptual loss for the training of the model components of our proposed SE model so as to enhance the input utterances without introducing significant distortion. Extensive evaluation experiments are conducted on the VoiceBank-DEMAND and VoiceBank-QUT tasks, containing stationary and non-stationary noises, respectively. The corresponding results show that the proposed SE method yields competitive perceptual metric scores on SE but significantly lower word error rates (WER) on ASR in relation to several top-of-the-line methods. Notably, the proposed SE method works remarkably well on the VoiceBank-QUT ASR task, thereby confirming its excellent generalization capability to unseen scenarios.

原文英語
主出版物標題Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
發行者Institute of Electrical and Electronics Engineers Inc.
頁面131-136
頁數6
ISBN(電子)9786165904773
DOIs
出版狀態已發佈 - 2022
事件2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, 泰国
持續時間: 2022 11月 72022 11月 10

出版系列

名字Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

會議

會議2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
國家/地區泰国
城市Chiang Mai
期間2022/11/072022/11/10

ASJC Scopus subject areas

  • 電腦網路與通信
  • 資訊系統
  • 訊號處理

指紋

深入研究「Investigating Low-Distortion Speech Enhancement with Discrete Cosine Transform Features for Robust Speech Recognition」主題。共同形成了獨特的指紋。

引用此