Effective modulation spectrum factorization for robust speech recognition

Yu Chen Kao, Yi Ting Wang, Berlin Chen

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

Modulation spectrum processing of acoustic features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. An emerging school of thought is to conduct nonnegative matrix factorization (NMF) on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. This paper presents a continuation of this general line of research and its main contribution is two-fold. One is to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. The other is to investigate a novel cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more discriminative information in the NMF processed modulation spectra. All experiments were conducted on the Aurora-2 corpus and task. Empirical evidence reveals that our methods can offer substantial improvements over the baseline NMF method and achieve performance competitive to or better than several widely-used robustness methods.

Original languageEnglish
Pages (from-to)2724-2728
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014 Jan 1
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 2014 Sep 142014 Sep 18

Fingerprint

Robust Speech Recognition
Non-negative Matrix Factorization
Factorization
Speech recognition
Modulation
Acoustics
Robustness
Factorization Method
Matrix Method
Processing
Sparsity
Acoustic noise
Continuation
Baseline
Simplicity
Fold
Speech Recognition
Invariant
Line
Experiment

Keywords

  • Automatic speech recognition
  • Modulation spectrum
  • Nonnegative matrix factorization
  • Normalization
  • Robustness

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{8048367842ff4ab1be59d96544b24d90,
title = "Effective modulation spectrum factorization for robust speech recognition",
abstract = "Modulation spectrum processing of acoustic features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. An emerging school of thought is to conduct nonnegative matrix factorization (NMF) on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. This paper presents a continuation of this general line of research and its main contribution is two-fold. One is to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. The other is to investigate a novel cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more discriminative information in the NMF processed modulation spectra. All experiments were conducted on the Aurora-2 corpus and task. Empirical evidence reveals that our methods can offer substantial improvements over the baseline NMF method and achieve performance competitive to or better than several widely-used robustness methods.",
keywords = "Automatic speech recognition, Modulation spectrum, Nonnegative matrix factorization, Normalization, Robustness",
author = "Kao, {Yu Chen} and Wang, {Yi Ting} and Berlin Chen",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "2724--2728",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Effective modulation spectrum factorization for robust speech recognition

AU - Kao, Yu Chen

AU - Wang, Yi Ting

AU - Chen, Berlin

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Modulation spectrum processing of acoustic features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. An emerging school of thought is to conduct nonnegative matrix factorization (NMF) on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. This paper presents a continuation of this general line of research and its main contribution is two-fold. One is to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. The other is to investigate a novel cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more discriminative information in the NMF processed modulation spectra. All experiments were conducted on the Aurora-2 corpus and task. Empirical evidence reveals that our methods can offer substantial improvements over the baseline NMF method and achieve performance competitive to or better than several widely-used robustness methods.

AB - Modulation spectrum processing of acoustic features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. An emerging school of thought is to conduct nonnegative matrix factorization (NMF) on the modulation spectrum domain so as to distill intrinsic and noise-invariant temporal structure characteristics of acoustic features for better robustness. This paper presents a continuation of this general line of research and its main contribution is two-fold. One is to explore the notion of sparsity for NMF so as to ensure the derived basis vectors have sparser and more localized representations of the modulation spectra. The other is to investigate a novel cluster-based NMF processing, in which speech utterances belonging to different clusters will have their own set of cluster-specific basis vectors. As such, the speech utterances can retain more discriminative information in the NMF processed modulation spectra. All experiments were conducted on the Aurora-2 corpus and task. Empirical evidence reveals that our methods can offer substantial improvements over the baseline NMF method and achieve performance competitive to or better than several widely-used robustness methods.

KW - Automatic speech recognition

KW - Modulation spectrum

KW - Nonnegative matrix factorization

KW - Normalization

KW - Robustness

UR - http://www.scopus.com/inward/record.url?scp=84910049339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910049339&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84910049339

SP - 2724

EP - 2728

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -