TY - JOUR
T1 - Distribution-based feature normalization for robust speech recognition leveraging context and dynamics cues
AU - Kao, Yu Chen
AU - Chen, Berlin
PY - 2013
Y1 - 2013
N2 - Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this paper, we present a novel extension to the conventional HEQ approach in two significant aspects. First, polynomial regression of various orders is employed to efficiently perform feature normalization building up the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-offrames assumptions made by the conventional HEQ approach. All experiments were carried out on the Aurora-2 database and task and further verified on the Aurora-4 database and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.
AB - Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this paper, we present a novel extension to the conventional HEQ approach in two significant aspects. First, polynomial regression of various orders is employed to efficiently perform feature normalization building up the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-offrames assumptions made by the conventional HEQ approach. All experiments were carried out on the Aurora-2 database and task and further verified on the Aurora-4 database and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.
KW - Contextual information
KW - Histogram equalization
KW - Noise robustness
KW - Spatial-temporal
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84906269875&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906269875&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84906269875
SN - 2308-457X
SP - 2958
EP - 2962
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
Y2 - 25 August 2013 through 29 August 2013
ER -