Leveraging distributional characteristics of modulation spectra for robust speech recognition

Yu Chen Kao, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Modulation spectrum processing of speech features has recently become an active area of intensive research in the speech recognition community. As for normalization of modulation spectra, spectral histogram equalization (SHE) seems to be one of the most effective techniques that have been used to compensate the nonlinear distortion. In this paper, we investigate a novel use of polynomial-fitting techniques for modulation histogram equalization, which has the advantages of lower storage and time consumption when compared with the conventional SHE methods. Further, we also investigated the possibility of combining our approach with other temporal feature normalization methods. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the proposed approach was thoroughly tested and verified by comparisons with the other popular modulation spectrum normalization methods, which suggests the utility of the proposed approach.

Original languageEnglish
Title of host publication2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012
Pages120-125
Number of pages6
DOIs
Publication statusPublished - 2012 Nov 12
Event2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012 - Montreal, QC, Canada
Duration: 2012 Jul 22012 Jul 5

Publication series

Name2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012

Other

Other2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012
CountryCanada
CityMontreal, QC
Period12/7/212/7/5

Fingerprint

Speech recognition
Modulation
Nonlinear distortion
Acoustic noise
Polynomials
Processing
Experiments

Keywords

  • modulation spectrum
  • robust speech recognition
  • spectral histogram equalization
  • temporal average

ASJC Scopus subject areas

  • Computer Science Applications
  • Signal Processing

Cite this

Kao, Y. C., & Chen, B. (2012). Leveraging distributional characteristics of modulation spectra for robust speech recognition. In 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012 (pp. 120-125). [6310476] (2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012). https://doi.org/10.1109/ISSPA.2012.6310476

Leveraging distributional characteristics of modulation spectra for robust speech recognition. / Kao, Yu Chen; Chen, Berlin.

2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012. 2012. p. 120-125 6310476 (2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kao, YC & Chen, B 2012, Leveraging distributional characteristics of modulation spectra for robust speech recognition. in 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012., 6310476, 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012, pp. 120-125, 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012, Montreal, QC, Canada, 12/7/2. https://doi.org/10.1109/ISSPA.2012.6310476
Kao YC, Chen B. Leveraging distributional characteristics of modulation spectra for robust speech recognition. In 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012. 2012. p. 120-125. 6310476. (2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012). https://doi.org/10.1109/ISSPA.2012.6310476
Kao, Yu Chen ; Chen, Berlin. / Leveraging distributional characteristics of modulation spectra for robust speech recognition. 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012. 2012. pp. 120-125 (2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012).
@inproceedings{b75f1a45b4a246388a9378276e935dc0,
title = "Leveraging distributional characteristics of modulation spectra for robust speech recognition",
abstract = "Modulation spectrum processing of speech features has recently become an active area of intensive research in the speech recognition community. As for normalization of modulation spectra, spectral histogram equalization (SHE) seems to be one of the most effective techniques that have been used to compensate the nonlinear distortion. In this paper, we investigate a novel use of polynomial-fitting techniques for modulation histogram equalization, which has the advantages of lower storage and time consumption when compared with the conventional SHE methods. Further, we also investigated the possibility of combining our approach with other temporal feature normalization methods. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the proposed approach was thoroughly tested and verified by comparisons with the other popular modulation spectrum normalization methods, which suggests the utility of the proposed approach.",
keywords = "modulation spectrum, robust speech recognition, spectral histogram equalization, temporal average",
author = "Kao, {Yu Chen} and Berlin Chen",
year = "2012",
month = "11",
day = "12",
doi = "10.1109/ISSPA.2012.6310476",
language = "English",
isbn = "9781467303828",
series = "2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012",
pages = "120--125",
booktitle = "2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012",

}

TY - GEN

T1 - Leveraging distributional characteristics of modulation spectra for robust speech recognition

AU - Kao, Yu Chen

AU - Chen, Berlin

PY - 2012/11/12

Y1 - 2012/11/12

N2 - Modulation spectrum processing of speech features has recently become an active area of intensive research in the speech recognition community. As for normalization of modulation spectra, spectral histogram equalization (SHE) seems to be one of the most effective techniques that have been used to compensate the nonlinear distortion. In this paper, we investigate a novel use of polynomial-fitting techniques for modulation histogram equalization, which has the advantages of lower storage and time consumption when compared with the conventional SHE methods. Further, we also investigated the possibility of combining our approach with other temporal feature normalization methods. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the proposed approach was thoroughly tested and verified by comparisons with the other popular modulation spectrum normalization methods, which suggests the utility of the proposed approach.

AB - Modulation spectrum processing of speech features has recently become an active area of intensive research in the speech recognition community. As for normalization of modulation spectra, spectral histogram equalization (SHE) seems to be one of the most effective techniques that have been used to compensate the nonlinear distortion. In this paper, we investigate a novel use of polynomial-fitting techniques for modulation histogram equalization, which has the advantages of lower storage and time consumption when compared with the conventional SHE methods. Further, we also investigated the possibility of combining our approach with other temporal feature normalization methods. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the proposed approach was thoroughly tested and verified by comparisons with the other popular modulation spectrum normalization methods, which suggests the utility of the proposed approach.

KW - modulation spectrum

KW - robust speech recognition

KW - spectral histogram equalization

KW - temporal average

UR - http://www.scopus.com/inward/record.url?scp=84868600509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84868600509&partnerID=8YFLogxK

U2 - 10.1109/ISSPA.2012.6310476

DO - 10.1109/ISSPA.2012.6310476

M3 - Conference contribution

AN - SCOPUS:84868600509

SN - 9781467303828

T3 - 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012

SP - 120

EP - 125

BT - 2012 11th International Conference on Information Science, Signal Processing and their Applications, ISSPA 2012

ER -