Histogram equalization of contextual statistics of speech features for robust speech recognition

Hsin Ju Hsieh, Berlin Chen*, Jeih weih Hung

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

In the recent past, we have witnessed a flurry of research activity aimed at the development of novel and ingenious robustness methods for automatic speech recognition (ASR). Among them, histogram equalization (HEQ) of speech features constitutes one most prominent and successful line of research due to its inherent neat formulation and remarkable performance. In this paper, we adopt an effective modeling framework for joint equalization of spatial-temporal contextual statistics of speech features. On top of that, we explore various combinations of simple differencing and averaging operations to render the contextual relationships of feature vector components, not only between different dimensions but also between consecutive speech frames, in the HEQ process. Furthermore, several variants of HEQ are investigated and integrated into the proposed modeling framework to efficiently compensate for the effects of noise interference on the feature vector components. In addition, the utilities of the methods deduced from this framework and several existing robustness methods are analyzed and compared extensively. All experiments were carried out on the Aurora-2 database and task, and were further verified on the Aurora-4 database and task. Empirical experimental results suggest that our proposed methods can offer substantial improvements over the baseline system and achieve performance competitive to or better than some of the existing noise robustness methods in speech recognition.

Original languageEnglish
Pages (from-to)6769-6795
Number of pages27
JournalMultimedia Tools and Applications
Volume74
Issue number17
DOIs
Publication statusPublished - 2015 Sept 1

Keywords

  • Automatic speech recognition
  • Feature contextual statistics
  • Histogram equalization
  • Noise robustness

ASJC Scopus subject areas

  • Software
  • Media Technology
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Histogram equalization of contextual statistics of speech features for robust speech recognition'. Together they form a unique fingerprint.

Cite this