Distribution-based feature normalization for robust speech recognition leveraging context and dynamics cues

Yu Chen Kao, Berlin Chen

Research output: Contribution to journalArticle

Abstract

Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this paper, we present a novel extension to the conventional HEQ approach in two significant aspects. First, polynomial regression of various orders is employed to efficiently perform feature normalization building up the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-offrames assumptions made by the conventional HEQ approach. All experiments were carried out on the Aurora-2 database and task and further verified on the Aurora-4 database and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.

Original languageEnglish
Pages (from-to)2958-2962
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2013

Fingerprint

Robust Speech Recognition
Histogram Equalization
Speech recognition
Normalization
Statistics
Polynomials
Polynomial Regression
Regression Function
Error Rate
Baseline
Simplicity
Experiments
Context
Speech Recognition
Demonstrate
Experiment
Data Base
Conventional

Keywords

  • Contextual information
  • Histogram equalization
  • Noise robustness
  • Spatial-temporal
  • Speech recognition

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{1aeaafe30291411991f8131829921020,
title = "Distribution-based feature normalization for robust speech recognition leveraging context and dynamics cues",
abstract = "Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this paper, we present a novel extension to the conventional HEQ approach in two significant aspects. First, polynomial regression of various orders is employed to efficiently perform feature normalization building up the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-offrames assumptions made by the conventional HEQ approach. All experiments were carried out on the Aurora-2 database and task and further verified on the Aurora-4 database and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.",
keywords = "Contextual information, Histogram equalization, Noise robustness, Spatial-temporal, Speech recognition",
author = "Kao, {Yu Chen} and Berlin Chen",
year = "2013",
language = "English",
pages = "2958--2962",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Distribution-based feature normalization for robust speech recognition leveraging context and dynamics cues

AU - Kao, Yu Chen

AU - Chen, Berlin

PY - 2013

Y1 - 2013

N2 - Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this paper, we present a novel extension to the conventional HEQ approach in two significant aspects. First, polynomial regression of various orders is employed to efficiently perform feature normalization building up the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-offrames assumptions made by the conventional HEQ approach. All experiments were carried out on the Aurora-2 database and task and further verified on the Aurora-4 database and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.

AB - Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this paper, we present a novel extension to the conventional HEQ approach in two significant aspects. First, polynomial regression of various orders is employed to efficiently perform feature normalization building up the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-offrames assumptions made by the conventional HEQ approach. All experiments were carried out on the Aurora-2 database and task and further verified on the Aurora-4 database and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.

KW - Contextual information

KW - Histogram equalization

KW - Noise robustness

KW - Spatial-temporal

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84906269875&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906269875&partnerID=8YFLogxK

M3 - Article

SP - 2958

EP - 2962

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -