TY - JOUR
T1 - Histogram equalization of contextual statistics of speech features for robust speech recognition
AU - Hsieh, Hsin Ju
AU - Chen, Berlin
AU - Hung, Jeih weih
N1 - Publisher Copyright:
© 2014, Springer Science+Business Media New York.
PY - 2015/9/1
Y1 - 2015/9/1
N2 - In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.
AB - In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.
KW - Automatic speech recognition
KW - Feature contextual statistics
KW - Histogram equalization
KW - Noise robustness
UR - http://www.scopus.com/inward/record.url?scp=84957728175&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84957728175&partnerID=8YFLogxK
U2 - 10.1007/s11042-014-1929-y
DO - 10.1007/s11042-014-1929-y
M3 - Article
AN - SCOPUS:84957728175
SN - 1380-7501
VL - 74
SP - 6769
EP - 6795
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 17
ER -