Robust speech recognition via enhancing the complex-valued acoustic spectrum in modulation domain

Jeih Weih Hung, Hsin Ju Hsieh, Berlin Chen

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

The purpose of this paper is to develop a novel speech feature extraction framework for independently compensating the real and imaginary acoustic spectra of speech signals in the modulation domain with the techniques of histogram equalization (HEQ) and non-negative matrix factorization (NMF). By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, thereby creating noise-robust speech features. More specifically, the proposed framework makes the following three major contributions: First, via either of the HEQ and NMF operations, the long-term cross-frame correlation among the acoustic spectra at the same frequency can be captured to compensate for the spectral distortion caused by noise. Second, the noise effect can be handled in a high acoustic frequency resolution. Finally, the distortion dwelt in the acoustic spectra can be more extensively mitigated due to the independent processes for the respective real and imaginary parts. The evaluation experiments were carried out on the Aurora-2 and Aurora-4 benchmark tasks, and the corresponding results suggest that our proposed methods can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme, in speech recognition.

Original languageEnglish
Pages (from-to)236-251
Number of pages16
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume24
Issue number2
DOIs
Publication statusPublished - 2016 Feb 1

Fingerprint

Robust Speech Recognition
speech recognition
Speech recognition
Acoustics
acoustics
Noise
Modulation
modulation
Histogram Equalization
factorization
histograms
Factorization
Acoustic noise
Speech Acoustics
Benchmarking
acoustic frequencies
dwell
Noise Robustness
matrices
Non-negative Matrix Factorization

Keywords

  • Automatic speech recognition (ASR)
  • Feature extraction
  • Histogram equalization (HEQ)
  • Modulation spectrum
  • Noise robustness
  • Non-negative matrix factorization (NMF)

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Cite this

Robust speech recognition via enhancing the complex-valued acoustic spectrum in modulation domain. / Hung, Jeih Weih; Hsieh, Hsin Ju; Chen, Berlin.

In: IEEE/ACM Transactions on Audio Speech and Language Processing, Vol. 24, No. 2, 01.02.2016, p. 236-251.

Research output: Contribution to journalArticle

@article{39dd2621e74c429a82e49e81914ddc9f,
title = "Robust speech recognition via enhancing the complex-valued acoustic spectrum in modulation domain",
abstract = "The purpose of this paper is to develop a novel speech feature extraction framework for independently compensating the real and imaginary acoustic spectra of speech signals in the modulation domain with the techniques of histogram equalization (HEQ) and non-negative matrix factorization (NMF). By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, thereby creating noise-robust speech features. More specifically, the proposed framework makes the following three major contributions: First, via either of the HEQ and NMF operations, the long-term cross-frame correlation among the acoustic spectra at the same frequency can be captured to compensate for the spectral distortion caused by noise. Second, the noise effect can be handled in a high acoustic frequency resolution. Finally, the distortion dwelt in the acoustic spectra can be more extensively mitigated due to the independent processes for the respective real and imaginary parts. The evaluation experiments were carried out on the Aurora-2 and Aurora-4 benchmark tasks, and the corresponding results suggest that our proposed methods can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme, in speech recognition.",
keywords = "Automatic speech recognition (ASR), Feature extraction, Histogram equalization (HEQ), Modulation spectrum, Noise robustness, Non-negative matrix factorization (NMF)",
author = "Hung, {Jeih Weih} and Hsieh, {Hsin Ju} and Berlin Chen",
year = "2016",
month = "2",
day = "1",
doi = "10.1109/TASLP.2015.2504781",
language = "English",
volume = "24",
pages = "236--251",
journal = "IEEE/ACM Transactions on Speech and Language Processing",
issn = "2329-9290",
publisher = "IEEE Advancing Technology for Humanity",
number = "2",

}

TY - JOUR

T1 - Robust speech recognition via enhancing the complex-valued acoustic spectrum in modulation domain

AU - Hung, Jeih Weih

AU - Hsieh, Hsin Ju

AU - Chen, Berlin

PY - 2016/2/1

Y1 - 2016/2/1

N2 - The purpose of this paper is to develop a novel speech feature extraction framework for independently compensating the real and imaginary acoustic spectra of speech signals in the modulation domain with the techniques of histogram equalization (HEQ) and non-negative matrix factorization (NMF). By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, thereby creating noise-robust speech features. More specifically, the proposed framework makes the following three major contributions: First, via either of the HEQ and NMF operations, the long-term cross-frame correlation among the acoustic spectra at the same frequency can be captured to compensate for the spectral distortion caused by noise. Second, the noise effect can be handled in a high acoustic frequency resolution. Finally, the distortion dwelt in the acoustic spectra can be more extensively mitigated due to the independent processes for the respective real and imaginary parts. The evaluation experiments were carried out on the Aurora-2 and Aurora-4 benchmark tasks, and the corresponding results suggest that our proposed methods can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme, in speech recognition.

AB - The purpose of this paper is to develop a novel speech feature extraction framework for independently compensating the real and imaginary acoustic spectra of speech signals in the modulation domain with the techniques of histogram equalization (HEQ) and non-negative matrix factorization (NMF). By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, thereby creating noise-robust speech features. More specifically, the proposed framework makes the following three major contributions: First, via either of the HEQ and NMF operations, the long-term cross-frame correlation among the acoustic spectra at the same frequency can be captured to compensate for the spectral distortion caused by noise. Second, the noise effect can be handled in a high acoustic frequency resolution. Finally, the distortion dwelt in the acoustic spectra can be more extensively mitigated due to the independent processes for the respective real and imaginary parts. The evaluation experiments were carried out on the Aurora-2 and Aurora-4 benchmark tasks, and the corresponding results suggest that our proposed methods can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme, in speech recognition.

KW - Automatic speech recognition (ASR)

KW - Feature extraction

KW - Histogram equalization (HEQ)

KW - Modulation spectrum

KW - Noise robustness

KW - Non-negative matrix factorization (NMF)

UR - http://www.scopus.com/inward/record.url?scp=84962886232&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962886232&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2015.2504781

DO - 10.1109/TASLP.2015.2504781

M3 - Article

VL - 24

SP - 236

EP - 251

JO - IEEE/ACM Transactions on Speech and Language Processing

JF - IEEE/ACM Transactions on Speech and Language Processing

SN - 2329-9290

IS - 2

ER -