Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition

Hsin Ju Hsieh, Jeih Weih Hung, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Histogram equalization (HEQ) of speech features has recently become an active focus of much research in the field of robust speech recognition due to its inherent neat formulation and remarkable performance. Our work in this paper continues this general line of research in two significant aspects. First, a novel framework for joint equalization of spatial-temporal contextual statistics of speech features is proposed. For this idea to work, we leverage simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, for speech feature normalization. Second, we exploit a polynomial-fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech, so as to work in conjunction with the presented normalization framework. As such, it provides the advantages of lower storage and time consumption when compared with the conventional HEQ methods. All experiments were carried out on the Aurora-2 database and task. The performance of the methods deduced from our proposed framework was thoroughly tested and verified by comparisons with other popular robustness methods, which suggests the utility of our methods.

Original languageEnglish
Title of host publication13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Pages2621-2624
Number of pages4
Publication statusPublished - 2012 Dec 1
Event13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 - Portland, OR, United States
Duration: 2012 Sep 92012 Sep 13

Publication series

Name13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Volume3

Other

Other13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
CountryUnited States
CityPortland, OR
Period12/9/912/9/13

Fingerprint

Speech recognition
statistics
Statistics
normalization
Probability density function
performance
Polynomials
experiment
Experiments

Keywords

  • Feature contextual statistics
  • Histogram equalization
  • Noise robustness

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Communication

Cite this

Hsieh, H. J., Hung, J. W., & Chen, B. (2012). Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012 (pp. 2621-2624). (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; Vol. 3).

Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition. / Hsieh, Hsin Ju; Hung, Jeih Weih; Chen, Berlin.

13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. p. 2621-2624 (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012; Vol. 3).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hsieh, HJ, Hung, JW & Chen, B 2012, Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition. in 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, vol. 3, pp. 2621-2624, 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Portland, OR, United States, 12/9/9.
Hsieh HJ, Hung JW, Chen B. Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition. In 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. p. 2621-2624. (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012).
Hsieh, Hsin Ju ; Hung, Jeih Weih ; Chen, Berlin. / Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition. 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012. 2012. pp. 2621-2624 (13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012).
@inproceedings{1d320077426941aab092c3690e497a0e,
title = "Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition",
abstract = "Histogram equalization (HEQ) of speech features has recently become an active focus of much research in the field of robust speech recognition due to its inherent neat formulation and remarkable performance. Our work in this paper continues this general line of research in two significant aspects. First, a novel framework for joint equalization of spatial-temporal contextual statistics of speech features is proposed. For this idea to work, we leverage simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, for speech feature normalization. Second, we exploit a polynomial-fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech, so as to work in conjunction with the presented normalization framework. As such, it provides the advantages of lower storage and time consumption when compared with the conventional HEQ methods. All experiments were carried out on the Aurora-2 database and task. The performance of the methods deduced from our proposed framework was thoroughly tested and verified by comparisons with other popular robustness methods, which suggests the utility of our methods.",
keywords = "Feature contextual statistics, Histogram equalization, Noise robustness",
author = "Hsieh, {Hsin Ju} and Hung, {Jeih Weih} and Berlin Chen",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9781622767595",
series = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",
pages = "2621--2624",
booktitle = "13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012",

}

TY - GEN

T1 - Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition

AU - Hsieh, Hsin Ju

AU - Hung, Jeih Weih

AU - Chen, Berlin

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Histogram equalization (HEQ) of speech features has recently become an active focus of much research in the field of robust speech recognition due to its inherent neat formulation and remarkable performance. Our work in this paper continues this general line of research in two significant aspects. First, a novel framework for joint equalization of spatial-temporal contextual statistics of speech features is proposed. For this idea to work, we leverage simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, for speech feature normalization. Second, we exploit a polynomial-fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech, so as to work in conjunction with the presented normalization framework. As such, it provides the advantages of lower storage and time consumption when compared with the conventional HEQ methods. All experiments were carried out on the Aurora-2 database and task. The performance of the methods deduced from our proposed framework was thoroughly tested and verified by comparisons with other popular robustness methods, which suggests the utility of our methods.

AB - Histogram equalization (HEQ) of speech features has recently become an active focus of much research in the field of robust speech recognition due to its inherent neat formulation and remarkable performance. Our work in this paper continues this general line of research in two significant aspects. First, a novel framework for joint equalization of spatial-temporal contextual statistics of speech features is proposed. For this idea to work, we leverage simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, for speech feature normalization. Second, we exploit a polynomial-fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech, so as to work in conjunction with the presented normalization framework. As such, it provides the advantages of lower storage and time consumption when compared with the conventional HEQ methods. All experiments were carried out on the Aurora-2 database and task. The performance of the methods deduced from our proposed framework was thoroughly tested and verified by comparisons with other popular robustness methods, which suggests the utility of our methods.

KW - Feature contextual statistics

KW - Histogram equalization

KW - Noise robustness

UR - http://www.scopus.com/inward/record.url?scp=84878622002&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84878622002&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84878622002

SN - 9781622767595

T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

SP - 2621

EP - 2624

BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012

ER -