TY - GEN
T1 - Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition
AU - Hsieh, Hsin Ju
AU - Hung, Jeih Weih
AU - Chen, Berlin
PY - 2012
Y1 - 2012
N2 - Histogram equalization (HEQ) of speech features has recently become an active focus of much research in the field of robust speech recognition due to its inherent neat formulation and remarkable performance. Our work in this paper continues this general line of research in two significant aspects. First, a novel framework for joint equalization of spatial-temporal contextual statistics of speech features is proposed. For this idea to work, we leverage simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, for speech feature normalization. Second, we exploit a polynomial-fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech, so as to work in conjunction with the presented normalization framework. As such, it provides the advantages of lower storage and time consumption when compared with the conventional HEQ methods. All experiments were carried out on the Aurora-2 database and task. The performance of the methods deduced from our proposed framework was thoroughly tested and verified by comparisons with other popular robustness methods, which suggests the utility of our methods.
AB - Histogram equalization (HEQ) of speech features has recently become an active focus of much research in the field of robust speech recognition due to its inherent neat formulation and remarkable performance. Our work in this paper continues this general line of research in two significant aspects. First, a novel framework for joint equalization of spatial-temporal contextual statistics of speech features is proposed. For this idea to work, we leverage simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, for speech feature normalization. Second, we exploit a polynomial-fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech, so as to work in conjunction with the presented normalization framework. As such, it provides the advantages of lower storage and time consumption when compared with the conventional HEQ methods. All experiments were carried out on the Aurora-2 database and task. The performance of the methods deduced from our proposed framework was thoroughly tested and verified by comparisons with other popular robustness methods, which suggests the utility of our methods.
KW - Feature contextual statistics
KW - Histogram equalization
KW - Noise robustness
UR - http://www.scopus.com/inward/record.url?scp=84878622002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878622002&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878622002
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 2621
EP - 2624
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -