TY - JOUR
T1 - Exploiting spatial-temporal feature distribution characteristics for robust speech recognition
AU - Chen, Wei Hau
AU - Lin, Shih Hsiang
AU - Chen, Berlin
PY - 2008
Y1 - 2008
N2 - Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.
AB - Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.
KW - Histogram equalization
KW - Noise robustness
KW - Spatial-temporal distribution characteristics
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84867192908&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867192908&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84867192908
SN - 2308-457X
SP - 2004
EP - 2007
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
Y2 - 22 September 2008 through 26 September 2008
ER -