Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

Bi Cheng Yan, Chin Hong Shih, Shih Hung Liu, Berlin Chen

研究成果: 書貢獻/報告類型會議論文篇章


Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

主出版物標題2017 IEEE International Conference on Multimedia and Expo, ICME 2017
發行者IEEE Computer Society
出版狀態已發佈 - 2017 8月 28
事件2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, 香港
持續時間: 2017 7月 102017 7月 14


名字Proceedings - IEEE International Conference on Multimedia and Expo


其他2017 IEEE International Conference on Multimedia and Expo, ICME 2017
城市Hong Kong

ASJC Scopus subject areas

  • 電腦網路與通信
  • 電腦科學應用


深入研究「Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition」主題。共同形成了獨特的指紋。
