Exploring Feature Enhancement in the Modulation Spectrum Domain via Ideal Ratio Mask for Robust Speech Recognition

Bi Cheng Yan, Meng Che Wu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Development of robustness techniques is of paramount importance to the success of automatic speech recognition (ASR) systems. In this paper, we present a novel use of the ideal ratio mask (IRM) method to improve ASR robustness. IRM was originally proposed for time-frequency (T-F) masking-based speech enhancement and has shown considerable promise in preserving the intelligibility of a noisy mixture signal. Further, IRM is alternatively used to normalize the intermediate representations of speech feature vector sequences, in a holistic manner, for both training and test utterances. Finally, we instead treat IRM as a data augmentation method, conducted on speech feature vectors of training utterances or their intermediate representations, to generate additional augmented data for increasing the diversity of training data. A series of experiments carried out on the standard Aurora-4 database and task confirm the effectiveness of our methods.

Original languageEnglish
Title of host publication2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages759-763
Number of pages5
ISBN (Electronic)9789881476883
Publication statusPublished - 2020 Dec 7
Event2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Virtual, Auckland, New Zealand
Duration: 2020 Dec 72020 Dec 10

Publication series

Name2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings

Conference

Conference2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Country/TerritoryNew Zealand
CityVirtual, Auckland
Period2020/12/072020/12/10

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Signal Processing
  • Decision Sciences (miscellaneous)
  • Instrumentation

Fingerprint

Dive into the research topics of 'Exploring Feature Enhancement in the Modulation Spectrum Domain via Ideal Ratio Mask for Robust Speech Recognition'. Together they form a unique fingerprint.

Cite this