Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

Bi Cheng Yan, Chin Hong Shih, Shih Hung Liu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PublisherIEEE Computer Society
Pages577-582
Number of pages6
ISBN (Electronic)9781509060672
DOIs
Publication statusPublished - 2017 Aug 28
Event2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong
Duration: 2017 Jul 102017 Jul 14

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Other

Other2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Country/TerritoryHong Kong
CityHong Kong
Period2017/07/102017/07/14

Keywords

  • Deep neural network
  • Dictionary learning
  • Modulation spectrum
  • Robustness
  • Sparse coding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition'. Together they form a unique fingerprint.

Cite this