Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

Bi Cheng Yan, Chin Hong Shih, Shih Hung Liu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

Original languageEnglish
Title of host publication2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PublisherIEEE Computer Society
Pages577-582
Number of pages6
ISBN (Electronic)9781509060672
DOIs
Publication statusPublished - 2017 Aug 28
Event2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hong Kong
Duration: 2017 Jul 102017 Jul 14

Other

Other2017 IEEE International Conference on Multimedia and Expo, ICME 2017
CountryHong Kong
CityHong Kong
Period17/7/1017/7/14

Fingerprint

Glossaries
Speech recognition
Modulation
Singular value decomposition
Acoustic noise
Acoustics
Experiments

Keywords

  • Deep neural network
  • Dictionary learning
  • Modulation spectrum
  • Robustness
  • Sparse coding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Yan, B. C., Shih, C. H., Liu, S. H., & Chen, B. (2017). Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. In 2017 IEEE International Conference on Multimedia and Expo, ICME 2017 (pp. 577-582). [8019509] IEEE Computer Society. https://doi.org/10.1109/ICME.2017.8019509

Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. / Yan, Bi Cheng; Shih, Chin Hong; Liu, Shih Hung; Chen, Berlin.

2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society, 2017. p. 577-582 8019509.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yan, BC, Shih, CH, Liu, SH & Chen, B 2017, Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. in 2017 IEEE International Conference on Multimedia and Expo, ICME 2017., 8019509, IEEE Computer Society, pp. 577-582, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, Hong Kong, Hong Kong, 17/7/10. https://doi.org/10.1109/ICME.2017.8019509
Yan BC, Shih CH, Liu SH, Chen B. Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. In 2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society. 2017. p. 577-582. 8019509 https://doi.org/10.1109/ICME.2017.8019509
Yan, Bi Cheng ; Shih, Chin Hong ; Liu, Shih Hung ; Chen, Berlin. / Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition. 2017 IEEE International Conference on Multimedia and Expo, ICME 2017. IEEE Computer Society, 2017. pp. 577-582
@inproceedings{c98a7e4596ce4207b5065245a132db6c,
title = "Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition",
abstract = "Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.",
keywords = "Deep neural network, Dictionary learning, Modulation spectrum, Robustness, Sparse coding",
author = "Yan, {Bi Cheng} and Shih, {Chin Hong} and Liu, {Shih Hung} and Berlin Chen",
year = "2017",
month = "8",
day = "28",
doi = "10.1109/ICME.2017.8019509",
language = "English",
pages = "577--582",
booktitle = "2017 IEEE International Conference on Multimedia and Expo, ICME 2017",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

AU - Yan, Bi Cheng

AU - Shih, Chin Hong

AU - Liu, Shih Hung

AU - Chen, Berlin

PY - 2017/8/28

Y1 - 2017/8/28

N2 - Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

AB - Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

KW - Deep neural network

KW - Dictionary learning

KW - Modulation spectrum

KW - Robustness

KW - Sparse coding

UR - http://www.scopus.com/inward/record.url?scp=85030229552&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030229552&partnerID=8YFLogxK

U2 - 10.1109/ICME.2017.8019509

DO - 10.1109/ICME.2017.8019509

M3 - Conference contribution

AN - SCOPUS:85030229552

SP - 577

EP - 582

BT - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

PB - IEEE Computer Society

ER -