Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

Bi Cheng Yan, Chin Hong Shih, Shih Hung Liu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PublisherIEEE Computer Society
Pages577-582
Number of pages6
ISBN (Electronic)9781509060672
DOIs
Publication statusPublished - 2017 Aug 28
Event2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong
Duration: 2017 Jul 102017 Jul 14

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Other

Other2017 IEEE International Conference on Multimedia and Expo, ICME 2017
CountryHong Kong
CityHong Kong
Period17/7/1017/7/14

Fingerprint

Glossaries
Speech recognition
Modulation
Singular value decomposition
Acoustic noise
Acoustics
Experiments

Keywords

  • Deep neural network
  • Dictionary learning
  • Modulation spectrum
  • Robustness
  • Sparse coding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Yan, B. C., Shih, C. H., Liu, S. H., & Chen, B. (2017). Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. In 2017 IEEE International Conference on Multimedia and Expo, ICME 2017 (pp. 577-582). [8019509] (Proceedings - IEEE International Conference on Multimedia and Expo). IEEE Computer Society. https://doi.org/10.1109/ICME.2017.8019509

Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. / Yan, Bi Cheng; Shih, Chin Hong; Liu, Shih Hung; Chen, Berlin.

2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society, 2017. p. 577-582 8019509 (Proceedings - IEEE International Conference on Multimedia and Expo).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yan, BC, Shih, CH, Liu, SH & Chen, B 2017, Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. in 2017 IEEE International Conference on Multimedia and Expo, ICME 2017., 8019509, Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, pp. 577-582, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, Hong Kong, Hong Kong, 17/7/10. https://doi.org/10.1109/ICME.2017.8019509
Yan BC, Shih CH, Liu SH, Chen B. Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. In 2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society. 2017. p. 577-582. 8019509. (Proceedings - IEEE International Conference on Multimedia and Expo). https://doi.org/10.1109/ICME.2017.8019509
Yan, Bi Cheng ; Shih, Chin Hong ; Liu, Shih Hung ; Chen, Berlin. / Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. 2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society, 2017. pp. 577-582 (Proceedings - IEEE International Conference on Multimedia and Expo).
@inproceedings{c98a7e4596ce4207b5065245a132db6c,
title = "Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition",
abstract = "Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.",
keywords = "Deep neural network, Dictionary learning, Modulation spectrum, Robustness, Sparse coding",
author = "Yan, {Bi Cheng} and Shih, {Chin Hong} and Liu, {Shih Hung} and Berlin Chen",
year = "2017",
month = "8",
day = "28",
doi = "10.1109/ICME.2017.8019509",
language = "English",
series = "Proceedings - IEEE International Conference on Multimedia and Expo",
publisher = "IEEE Computer Society",
pages = "577--582",
booktitle = "2017 IEEE International Conference on Multimedia and Expo, ICME 2017",

}

TY - GEN

T1 - Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

AU - Yan, Bi Cheng

AU - Shih, Chin Hong

AU - Liu, Shih Hung

AU - Chen, Berlin

PY - 2017/8/28

Y1 - 2017/8/28

N2 - Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

AB - Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

KW - Deep neural network

KW - Dictionary learning

KW - Modulation spectrum

KW - Robustness

KW - Sparse coding

UR - http://www.scopus.com/inward/record.url?scp=85030229552&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030229552&partnerID=8YFLogxK

U2 - 10.1109/ICME.2017.8019509

DO - 10.1109/ICME.2017.8019509

M3 - Conference contribution

AN - SCOPUS:85030229552

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 577

EP - 582

BT - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

PB - IEEE Computer Society

ER -