Modulation spectrum factorization for robust speech recognition

Wen Yi Chu, Jeih Weih Hung, Berlin Chen

Research output: Contribution to conferencePaper

7 Citations (Scopus)

Abstract

This paper presents a novel approach to improving the noise robustness of speech features built on top of nonnegative matrix factorization (NMF). To do this, we employ NMF to extract a common set of basis spectral vectors that cover the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The new modulation spectra of the speech features, constructed by mapping the original modulation spectra into the space spanned by these basis vectors, are demonstrated with good noise-robust capabilities. All experiments were conducted using the Aurora-2 database and task. The results show that the proposed NMF-based approach, together with mean and variance normalization (MVN), can provide average error reduction rates of over 65% and 12% relative as compared with the baseline MFCC system and that using the MVN method alone, respectively.

Original languageEnglish
Pages1-6
Number of pages6
Publication statusPublished - 2011 Dec 1
EventAsia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011 - Xi'an, China
Duration: 2011 Oct 182011 Oct 21

Other

OtherAsia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011
CountryChina
CityXi'an
Period11/10/1811/10/21

Fingerprint

Factorization
Speech recognition
Modulation
Experiments

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing

Cite this

Chu, W. Y., Hung, J. W., & Chen, B. (2011). Modulation spectrum factorization for robust speech recognition. 1-6. Paper presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011, Xi'an, China.

Modulation spectrum factorization for robust speech recognition. / Chu, Wen Yi; Hung, Jeih Weih; Chen, Berlin.

2011. 1-6 Paper presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011, Xi'an, China.

Research output: Contribution to conferencePaper

Chu, WY, Hung, JW & Chen, B 2011, 'Modulation spectrum factorization for robust speech recognition' Paper presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011, Xi'an, China, 11/10/18 - 11/10/21, pp. 1-6.
Chu WY, Hung JW, Chen B. Modulation spectrum factorization for robust speech recognition. 2011. Paper presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011, Xi'an, China.
Chu, Wen Yi ; Hung, Jeih Weih ; Chen, Berlin. / Modulation spectrum factorization for robust speech recognition. Paper presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011, Xi'an, China.6 p.
@conference{192ee642bdce4ad1ae93912f4f602afe,
title = "Modulation spectrum factorization for robust speech recognition",
abstract = "This paper presents a novel approach to improving the noise robustness of speech features built on top of nonnegative matrix factorization (NMF). To do this, we employ NMF to extract a common set of basis spectral vectors that cover the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The new modulation spectra of the speech features, constructed by mapping the original modulation spectra into the space spanned by these basis vectors, are demonstrated with good noise-robust capabilities. All experiments were conducted using the Aurora-2 database and task. The results show that the proposed NMF-based approach, together with mean and variance normalization (MVN), can provide average error reduction rates of over 65{\%} and 12{\%} relative as compared with the baseline MFCC system and that using the MVN method alone, respectively.",
author = "Chu, {Wen Yi} and Hung, {Jeih Weih} and Berlin Chen",
year = "2011",
month = "12",
day = "1",
language = "English",
pages = "1--6",
note = "Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011 ; Conference date: 18-10-2011 Through 21-10-2011",

}

TY - CONF

T1 - Modulation spectrum factorization for robust speech recognition

AU - Chu, Wen Yi

AU - Hung, Jeih Weih

AU - Chen, Berlin

PY - 2011/12/1

Y1 - 2011/12/1

N2 - This paper presents a novel approach to improving the noise robustness of speech features built on top of nonnegative matrix factorization (NMF). To do this, we employ NMF to extract a common set of basis spectral vectors that cover the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The new modulation spectra of the speech features, constructed by mapping the original modulation spectra into the space spanned by these basis vectors, are demonstrated with good noise-robust capabilities. All experiments were conducted using the Aurora-2 database and task. The results show that the proposed NMF-based approach, together with mean and variance normalization (MVN), can provide average error reduction rates of over 65% and 12% relative as compared with the baseline MFCC system and that using the MVN method alone, respectively.

AB - This paper presents a novel approach to improving the noise robustness of speech features built on top of nonnegative matrix factorization (NMF). To do this, we employ NMF to extract a common set of basis spectral vectors that cover the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The new modulation spectra of the speech features, constructed by mapping the original modulation spectra into the space spanned by these basis vectors, are demonstrated with good noise-robust capabilities. All experiments were conducted using the Aurora-2 database and task. The results show that the proposed NMF-based approach, together with mean and variance normalization (MVN), can provide average error reduction rates of over 65% and 12% relative as compared with the baseline MFCC system and that using the MVN method alone, respectively.

UR - http://www.scopus.com/inward/record.url?scp=84866873512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866873512&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84866873512

SP - 1

EP - 6

ER -