Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+

Yu Sheng Tsao, Berlin Chen, Jeih Weih Hung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.

Original languageEnglish
Title of host publicationProceedings - 2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665491389
DOIs
Publication statusPublished - 2022
Event2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022 - Changhua, Taiwan
Duration: 2022 Oct 142022 Oct 16

Publication series

NameProceedings - 2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022

Conference

Conference2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022
Country/TerritoryTaiwan
CityChanghua
Period2022/10/142022/10/16

Keywords

  • FullSubNet
  • FullSubNet+
  • deep learning
  • discrete cosine transform
  • speech enhancement

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Engineering (miscellaneous)
  • Electrical and Electronic Engineering
  • Instrumentation
  • Transportation

Fingerprint

Dive into the research topics of 'Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+'. Together they form a unique fingerprint.

Cite this