Robust speech recognition using spatial-temporal feature distribution characteristics

Berlin Chen, Wei Hau Chen, Shih Hsiang Lin, Wen Yi Chu

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.

Original languageEnglish
Pages (from-to)919-926
Number of pages8
JournalPattern Recognition Letters
Volume32
Issue number7
DOIs
Publication statusPublished - 2011 May 1

Fingerprint

Speech recognition
Acoustic noise
Acoustics
Experiments

Keywords

  • Aurora-2
  • Histogram equalization
  • Noise robustness
  • Spatial-temporal distribution characteristics
  • Speech recognition

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Robust speech recognition using spatial-temporal feature distribution characteristics. / Chen, Berlin; Chen, Wei Hau; Lin, Shih Hsiang; Chu, Wen Yi.

In: Pattern Recognition Letters, Vol. 32, No. 7, 01.05.2011, p. 919-926.

Research output: Contribution to journalArticle

Chen, Berlin ; Chen, Wei Hau ; Lin, Shih Hsiang ; Chu, Wen Yi. / Robust speech recognition using spatial-temporal feature distribution characteristics. In: Pattern Recognition Letters. 2011 ; Vol. 32, No. 7. pp. 919-926.
@article{769d0b4476ee48799cf5ec9c53aba5db,
title = "Robust speech recognition using spatial-temporal feature distribution characteristics",
abstract = "Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.",
keywords = "Aurora-2, Histogram equalization, Noise robustness, Spatial-temporal distribution characteristics, Speech recognition",
author = "Berlin Chen and Chen, {Wei Hau} and Lin, {Shih Hsiang} and Chu, {Wen Yi}",
year = "2011",
month = "5",
day = "1",
doi = "10.1016/j.patrec.2011.01.016",
language = "English",
volume = "32",
pages = "919--926",
journal = "Pattern Recognition Letters",
issn = "0167-8655",
publisher = "Elsevier",
number = "7",

}

TY - JOUR

T1 - Robust speech recognition using spatial-temporal feature distribution characteristics

AU - Chen, Berlin

AU - Chen, Wei Hau

AU - Lin, Shih Hsiang

AU - Chu, Wen Yi

PY - 2011/5/1

Y1 - 2011/5/1

N2 - Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.

AB - Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.

KW - Aurora-2

KW - Histogram equalization

KW - Noise robustness

KW - Spatial-temporal distribution characteristics

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=79951960110&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951960110&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2011.01.016

DO - 10.1016/j.patrec.2011.01.016

M3 - Article

VL - 32

SP - 919

EP - 926

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

SN - 0167-8655

IS - 7

ER -