Histogram equalization of contextual statistics of speech features for robust speech recognition

Hsin Ju Hsieh, Berlin Chen, Jeih weih Hung

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.

Original languageEnglish
Pages (from-to)6769-6795
Number of pages27
JournalMultimedia Tools and Applications
Volume74
Issue number17
DOIs
Publication statusPublished - 2015 Sep 1

Fingerprint

Speech recognition
Statistics
Acoustic noise
Experiments

Keywords

  • Automatic speech recognition
  • Feature contextual statistics
  • Histogram equalization
  • Noise robustness

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Histogram equalization of contextual statistics of speech features for robust speech recognition. / Hsieh, Hsin Ju; Chen, Berlin; Hung, Jeih weih.

In: Multimedia Tools and Applications, Vol. 74, No. 17, 01.09.2015, p. 6769-6795.

Research output: Contribution to journalArticle

@article{20ba5c65783d4ef78e5d4fac867ffefd,
title = "Histogram equalization of contextual statistics of speech features for robust speech recognition",
abstract = "In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.",
keywords = "Automatic speech recognition, Feature contextual statistics, Histogram equalization, Noise robustness",
author = "Hsieh, {Hsin Ju} and Berlin Chen and Hung, {Jeih weih}",
year = "2015",
month = "9",
day = "1",
doi = "10.1007/s11042-014-1929-y",
language = "English",
volume = "74",
pages = "6769--6795",
journal = "Multimedia Tools and Applications",
issn = "1380-7501",
publisher = "Springer Netherlands",
number = "17",

}

TY - JOUR

T1 - Histogram equalization of contextual statistics of speech features for robust speech recognition

AU - Hsieh, Hsin Ju

AU - Chen, Berlin

AU - Hung, Jeih weih

PY - 2015/9/1

Y1 - 2015/9/1

N2 - In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.

AB - In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.

KW - Automatic speech recognition

KW - Feature contextual statistics

KW - Histogram equalization

KW - Noise robustness

UR - http://www.scopus.com/inward/record.url?scp=84957728175&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957728175&partnerID=8YFLogxK

U2 - 10.1007/s11042-014-1929-y

DO - 10.1007/s11042-014-1929-y

M3 - Article

AN - SCOPUS:84957728175

VL - 74

SP - 6769

EP - 6795

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

SN - 1380-7501

IS - 17

ER -