Investigating the use of speech features and their corresponding distribution characteristics for robust speech recognition

Shih Hsiang Lin*, Yao Ming Yeh, Berlin Chen

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

The performance of current automatic speech recognition (ASR) systems often deteriorates radically when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness over the last few decades. Related work reported in the literature can be generally divided into two aspects according to whether the orientation of the methods is either from the feature domain or from the corresponding probability distributions. In this paper, we present a polynomial regression approach which has the merit of directly characterizing the relationship between the speech features and their corresponding probability distributions to compensate the noise effects. Two variants of the proposed approach are also extensively investigated as well. All experiments are conducted on the Aurora-2 database and task. Experimental results show that for clean-condition training, our approaches achieve considerable word error rate reductions over the baseline system, and also significantly outperform other conventional methods.

Original languageEnglish
Pages87-92
Number of pages6
Publication statusPublished - 2007
Event2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan
Duration: 2007 Dec 92007 Dec 13

Other

Other2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007
Country/TerritoryJapan
CityKyoto
Period2007/12/092007/12/13

Keywords

  • Clustering
  • Histogram equalization
  • Polynomial regression
  • Robustness
  • Speech recognition

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Investigating the use of speech features and their corresponding distribution characteristics for robust speech recognition'. Together they form a unique fingerprint.

Cite this