TY - GEN
T1 - Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition
AU - Yan, Bi Cheng
AU - Shih, Chin Hong
AU - Liu, Shih Hung
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/8/28
Y1 - 2017/8/28
N2 - Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.
AB - Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.
KW - Deep neural network
KW - Dictionary learning
KW - Modulation spectrum
KW - Robustness
KW - Sparse coding
UR - http://www.scopus.com/inward/record.url?scp=85030229552&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030229552&partnerID=8YFLogxK
U2 - 10.1109/ICME.2017.8019509
DO - 10.1109/ICME.2017.8019509
M3 - Conference contribution
AN - SCOPUS:85030229552
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 577
EP - 582
BT - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PB - IEEE Computer Society
T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Y2 - 10 July 2017 through 14 July 2017
ER -