Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition

Shih Hsiang Lin, Yao Ming Yeh, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

The performance of current automatic speech recognition (ASR) systems radically deteriorates when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness in the past several years. Histogram equalization (HEQ) is one of the most efficient techniques that have been used to compensate the nonlinear distortion. In this paper, we explored the use of the data fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech for HEQ, in contrast to the conventional table-lookup or quantile based approaches. Moreover, the temporal average operation was also performed on the feature vector components to alleviate the influence of sharp peaks and valleys that were caused by non-stationary noises. Finally, we also investigated the possibility of combining our approaches with other feature discrimination and decorrelation methods. All experiments were carried out on the Aurora-2 database and task. Encouraging results were initially demonstrated.

Original languageEnglish
Title of host publicationINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
PublisherInternational Speech Communication Association
Pages2522-2525
Number of pages4
ISBN (Print)9781604234497
Publication statusPublished - 2006 Jan 1
EventINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP - Pittsburgh, PA, United States
Duration: 2006 Sep 172006 Sep 21

Publication series

NameINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
Volume5

Other

OtherINTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP
CountryUnited States
CityPittsburgh, PA
Period06/9/1706/9/21

Fingerprint

Speech recognition
Polynomials
Nonlinear distortion
Table lookup
Probability density function
Experiments

Keywords

  • Data fitting
  • Histogram equalization
  • Robustness
  • Temporal average

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Lin, S. H., Yeh, Y. M., & Chen, B. (2006). Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP (pp. 2522-2525). (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 5). International Speech Communication Association.

Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition. / Lin, Shih Hsiang; Yeh, Yao Ming; Chen, Berlin.

INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. International Speech Communication Association, 2006. p. 2522-2525 (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP; Vol. 5).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, SH, Yeh, YM & Chen, B 2006, Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition. in INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, vol. 5, International Speech Communication Association, pp. 2522-2525, INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP, Pittsburgh, PA, United States, 06/9/17.
Lin SH, Yeh YM, Chen B. Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition. In INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. International Speech Communication Association. 2006. p. 2522-2525. (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP).
Lin, Shih Hsiang ; Yeh, Yao Ming ; Chen, Berlin. / Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition. INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP. International Speech Communication Association, 2006. pp. 2522-2525 (INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP).
@inproceedings{a30f0e36442d4757a3f13618e8a31a9a,
title = "Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition",
abstract = "The performance of current automatic speech recognition (ASR) systems radically deteriorates when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness in the past several years. Histogram equalization (HEQ) is one of the most efficient techniques that have been used to compensate the nonlinear distortion. In this paper, we explored the use of the data fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech for HEQ, in contrast to the conventional table-lookup or quantile based approaches. Moreover, the temporal average operation was also performed on the feature vector components to alleviate the influence of sharp peaks and valleys that were caused by non-stationary noises. Finally, we also investigated the possibility of combining our approaches with other feature discrimination and decorrelation methods. All experiments were carried out on the Aurora-2 database and task. Encouraging results were initially demonstrated.",
keywords = "Data fitting, Histogram equalization, Robustness, Temporal average",
author = "Lin, {Shih Hsiang} and Yeh, {Yao Ming} and Berlin Chen",
year = "2006",
month = "1",
day = "1",
language = "English",
isbn = "9781604234497",
series = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",
publisher = "International Speech Communication Association",
pages = "2522--2525",
booktitle = "INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP",

}

TY - GEN

T1 - Exploiting polynomial-fit histogram equalization and temporal average for robust speech recognition

AU - Lin, Shih Hsiang

AU - Yeh, Yao Ming

AU - Chen, Berlin

PY - 2006/1/1

Y1 - 2006/1/1

N2 - The performance of current automatic speech recognition (ASR) systems radically deteriorates when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness in the past several years. Histogram equalization (HEQ) is one of the most efficient techniques that have been used to compensate the nonlinear distortion. In this paper, we explored the use of the data fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech for HEQ, in contrast to the conventional table-lookup or quantile based approaches. Moreover, the temporal average operation was also performed on the feature vector components to alleviate the influence of sharp peaks and valleys that were caused by non-stationary noises. Finally, we also investigated the possibility of combining our approaches with other feature discrimination and decorrelation methods. All experiments were carried out on the Aurora-2 database and task. Encouraging results were initially demonstrated.

AB - The performance of current automatic speech recognition (ASR) systems radically deteriorates when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness in the past several years. Histogram equalization (HEQ) is one of the most efficient techniques that have been used to compensate the nonlinear distortion. In this paper, we explored the use of the data fitting scheme to efficiently approximate the inverse of the cumulative density function of training speech for HEQ, in contrast to the conventional table-lookup or quantile based approaches. Moreover, the temporal average operation was also performed on the feature vector components to alleviate the influence of sharp peaks and valleys that were caused by non-stationary noises. Finally, we also investigated the possibility of combining our approaches with other feature discrimination and decorrelation methods. All experiments were carried out on the Aurora-2 database and task. Encouraging results were initially demonstrated.

KW - Data fitting

KW - Histogram equalization

KW - Robustness

KW - Temporal average

UR - http://www.scopus.com/inward/record.url?scp=44949100427&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44949100427&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:44949100427

SN - 9781604234497

T3 - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

SP - 2522

EP - 2525

BT - INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

PB - International Speech Communication Association

ER -