TY - JOUR
T1 - Exploring low-dimensional structures of modulation spectra for robust speech recognition
AU - Yan, Bi Cheng
AU - Shih, Chin Hong
AU - Liu, Shih Hung
AU - Chen, Berlin
N1 - Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech features lying on some low dimensional subspaces, has gained considerable interest from the ASR community. When speech features are contaminated with various types of environmental noise, its corresponding modulation spectra can be regarded as superpositions of unstructured sparse noise over the inherent linguistic information. As such, we in this paper endeavor to explore the low dimensional structures of modulation spectra, in the hope to obtain more noise-robust speech features. The main contribution is that we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Furthermore, we also extensively compare our approach with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task. The empirical results show that the proposed LRR-based method can provide significant word error reductions for a typical DNN-HMM hybrid ASR system.
AB - Developments of noise robustness techniques are vital to the success of automatic speech recognition (ASR) systems in face of varying sources of environmental interference. Recent studies have shown that exploring low-dimensional structures of speech features can yield good robustness. Along this vein, research on low-rank representation (LRR), which considers the intrinsic structures of speech features lying on some low dimensional subspaces, has gained considerable interest from the ASR community. When speech features are contaminated with various types of environmental noise, its corresponding modulation spectra can be regarded as superpositions of unstructured sparse noise over the inherent linguistic information. As such, we in this paper endeavor to explore the low dimensional structures of modulation spectra, in the hope to obtain more noise-robust speech features. The main contribution is that we propose a novel use of the LRR-based method to discover the subspace structures of modulation spectra, thereby alleviating the negative effects of noise interference. Furthermore, we also extensively compare our approach with several well-practiced feature-based normalization methods. All experiments were conducted and verified on the Aurora-4 database and task. The empirical results show that the proposed LRR-based method can provide significant word error reductions for a typical DNN-HMM hybrid ASR system.
KW - Deep neural network
KW - Low-rank representation
KW - Modulation spectrum
KW - Robustness
KW - Sparse representation
UR - http://www.scopus.com/inward/record.url?scp=85039151148&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85039151148&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-611
DO - 10.21437/Interspeech.2017-611
M3 - Conference article
AN - SCOPUS:85039151148
SN - 2308-457X
VL - 2017-August
SP - 3637
EP - 3641
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -