Exploiting spatial-temporal feature distribution characteristics for robust speech recognition

Wei Hau Chen, Shih Hsiang Lin, Berlin Chen

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.

Original languageEnglish
Pages (from-to)2004-2007
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2008

Fingerprint

Speech recognition
Noise
Acoustic noise
Experiments
Databases

Keywords

  • Histogram equalization
  • Noise robustness
  • Spatial-temporal distribution characteristics
  • Speech recognition

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Sensory Systems

Cite this

@article{be04bc68faa34c2095a8f76560ff73a2,
title = "Exploiting spatial-temporal feature distribution characteristics for robust speech recognition",
abstract = "Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.",
keywords = "Histogram equalization, Noise robustness, Spatial-temporal distribution characteristics, Speech recognition",
author = "Chen, {Wei Hau} and Lin, {Shih Hsiang} and Berlin Chen",
year = "2008",
language = "English",
pages = "2004--2007",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Exploiting spatial-temporal feature distribution characteristics for robust speech recognition

AU - Chen, Wei Hau

AU - Lin, Shih Hsiang

AU - Chen, Berlin

PY - 2008

Y1 - 2008

N2 - Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.

AB - Noise robustness is one of the primary challenges facing most automatic speech recognition (ASR) systems. Quite several speech feature histogram equalization (HEQ) methods have been developed to compensate for nonlinear noise distortions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without taking into consideration the contextual relationships between consecutive speech frames. In this paper, we present a novel HEQ approach that exploits spatial-temporal feature distribution characteristics for speech feature normalization. All experiments were carried out on the Aurora-2 database and task. The performance of the presented approach is tested and verified by comparison with the other HEQ methods. The experiment results show that for clean-condition training, our method yields a significant word error rate reduction over the baseline system, and also considerably outperforms the other HEQ methods compared in this paper.

KW - Histogram equalization

KW - Noise robustness

KW - Spatial-temporal distribution characteristics

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84867192908&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867192908&partnerID=8YFLogxK

M3 - Article

SP - 2004

EP - 2007

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -