Adaptive-FSN: Integrating Full-Band Extraction and Adaptive Sub-Band Encoding for Monaural Speech Enhancement

Yu Sheng Tsao, Kuan Hsun Ho, Jeih Weih Hung, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An important more recent thread of speech enhancement work is to utilize fine-grinded local spectral patterns with sub-band processing that complement full-band features nicely. To extend the efficacy of sub-band spectral information, we propose Adaptive-FSN, a fully convolutional real-time speech enhancement framework, to dynamically acquire a sub-band embedding within a wide range of sub-band frequencies. We exploit an adaptive subband encoder to portray sub-band processing that encapsulates a wide range of sub-band units. Then we build this effective sub-band embedding with a Conformer-based structure and multi-view attention. As for the full-band features, we make use of the FullSubNet+ architecture with its full-band extractor to get global spectral information. Finally, a Conformer-based fusion model combines the above information sources to predict the complex ideal ratio mask (cIRM). Experimental results on the VoiceBank-DEMAND benchmark task reveal that this novel framework outperforms FullSubNet+ by promoting the quality of processed utterances and reducing the implementation complexity for faster real-time computation.

Original languageEnglish
Title of host publication2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages458-464
Number of pages7
ISBN (Electronic)9798350396904
DOIs
Publication statusPublished - 2023
Event2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, Qatar
Duration: 2023 Jan 92023 Jan 12

Publication series

Name2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

Conference2022 IEEE Spoken Language Technology Workshop, SLT 2022
Country/TerritoryQatar
CityDoha
Period2023/01/092023/01/12

Keywords

  • FullSubNet
  • Speech enhancement
  • complex spectrum
  • real-time computation
  • sub-band processing

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Adaptive-FSN: Integrating Full-Band Extraction and Adaptive Sub-Band Encoding for Monaural Speech Enhancement'. Together they form a unique fingerprint.

Cite this