Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition

Hsin Ju Hsieh, Berlin Chen, Jeih Weih Hung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we propose a speech enhancement technique which compensates for the real and imaginary acoustic spectrograms separately. This technique leverages principal component analysis (PCA) to highlight the clean speech components of the modulation spectra for noise-corrupted acoustic spectrograms. By doing so, we can enhance not only the magnitude but also the phase portions of the complex-valued acoustic spectrogram, thereby creating noise-robust speech features. More particularly, the proposed technique possesses two explicit merits. First, via the operation on modulation domain, the long-term cross-time correlation among the acoustic spectrogram can be captured and subsequently employed to compensate for the spectral distortion caused by noise. Next, due to the individual processing of real and imaginary acoustic spectrograms, the proposed method will not encounter a knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods especially when the noise reduction process is inevitable. All of the evaluation experiments are conducted on the Aurora-2 and Aurora-4 databases and tasks. The corresponding results demonstrate that under the clean-condition training setting, our proposed method can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE), in speech recognition.

Original languageEnglish
Title of host publication2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages303-307
Number of pages5
ISBN (Electronic)9789881476807
DOIs
Publication statusPublished - 2016 Feb 19
Event2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 - Hong Kong, Hong Kong
Duration: 2015 Dec 162015 Dec 19

Publication series

Name2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015

Other

Other2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
CountryHong Kong
CityHong Kong
Period15/12/1615/12/19

Fingerprint

Spectrogram
Speech Recognition
Speech recognition
Acoustic noise
Acoustics
Modulation
Speech enhancement
Noise Robustness
Speech Enhancement
Noise abatement
Principal component analysis
Noise Reduction
Leverage
Principal Component Analysis
Enhancement
Processing
Evaluation
Term
Demonstrate
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Modelling and Simulation
  • Signal Processing

Cite this

Hsieh, H. J., Chen, B., & Hung, J. W. (2016). Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 (pp. 303-307). [7415526] (2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2015.7415526

Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition. / Hsieh, Hsin Ju; Chen, Berlin; Hung, Jeih Weih.

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. p. 303-307 7415526 (2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hsieh, HJ, Chen, B & Hung, JW 2016, Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition. in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015., 7415526, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015, Institute of Electrical and Electronics Engineers Inc., pp. 303-307, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015, Hong Kong, Hong Kong, 15/12/16. https://doi.org/10.1109/APSIPA.2015.7415526
Hsieh HJ, Chen B, Hung JW. Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015. Institute of Electrical and Electronics Engineers Inc. 2016. p. 303-307. 7415526. (2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015). https://doi.org/10.1109/APSIPA.2015.7415526
Hsieh, Hsin Ju ; Chen, Berlin ; Hung, Jeih Weih. / Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition. 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 303-307 (2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015).
@inproceedings{d067736ede47424d8a4593c89d5b3daf,
title = "Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition",
abstract = "In this paper, we propose a speech enhancement technique which compensates for the real and imaginary acoustic spectrograms separately. This technique leverages principal component analysis (PCA) to highlight the clean speech components of the modulation spectra for noise-corrupted acoustic spectrograms. By doing so, we can enhance not only the magnitude but also the phase portions of the complex-valued acoustic spectrogram, thereby creating noise-robust speech features. More particularly, the proposed technique possesses two explicit merits. First, via the operation on modulation domain, the long-term cross-time correlation among the acoustic spectrogram can be captured and subsequently employed to compensate for the spectral distortion caused by noise. Next, due to the individual processing of real and imaginary acoustic spectrograms, the proposed method will not encounter a knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods especially when the noise reduction process is inevitable. All of the evaluation experiments are conducted on the Aurora-2 and Aurora-4 databases and tasks. The corresponding results demonstrate that under the clean-condition training setting, our proposed method can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE), in speech recognition.",
author = "Hsieh, {Hsin Ju} and Berlin Chen and Hung, {Jeih Weih}",
year = "2016",
month = "2",
day = "19",
doi = "10.1109/APSIPA.2015.7415526",
language = "English",
series = "2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "303--307",
booktitle = "2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015",

}

TY - GEN

T1 - Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition

AU - Hsieh, Hsin Ju

AU - Chen, Berlin

AU - Hung, Jeih Weih

PY - 2016/2/19

Y1 - 2016/2/19

N2 - In this paper, we propose a speech enhancement technique which compensates for the real and imaginary acoustic spectrograms separately. This technique leverages principal component analysis (PCA) to highlight the clean speech components of the modulation spectra for noise-corrupted acoustic spectrograms. By doing so, we can enhance not only the magnitude but also the phase portions of the complex-valued acoustic spectrogram, thereby creating noise-robust speech features. More particularly, the proposed technique possesses two explicit merits. First, via the operation on modulation domain, the long-term cross-time correlation among the acoustic spectrogram can be captured and subsequently employed to compensate for the spectral distortion caused by noise. Next, due to the individual processing of real and imaginary acoustic spectrograms, the proposed method will not encounter a knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods especially when the noise reduction process is inevitable. All of the evaluation experiments are conducted on the Aurora-2 and Aurora-4 databases and tasks. The corresponding results demonstrate that under the clean-condition training setting, our proposed method can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE), in speech recognition.

AB - In this paper, we propose a speech enhancement technique which compensates for the real and imaginary acoustic spectrograms separately. This technique leverages principal component analysis (PCA) to highlight the clean speech components of the modulation spectra for noise-corrupted acoustic spectrograms. By doing so, we can enhance not only the magnitude but also the phase portions of the complex-valued acoustic spectrogram, thereby creating noise-robust speech features. More particularly, the proposed technique possesses two explicit merits. First, via the operation on modulation domain, the long-term cross-time correlation among the acoustic spectrogram can be captured and subsequently employed to compensate for the spectral distortion caused by noise. Next, due to the individual processing of real and imaginary acoustic spectrograms, the proposed method will not encounter a knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods especially when the noise reduction process is inevitable. All of the evaluation experiments are conducted on the Aurora-2 and Aurora-4 databases and tasks. The corresponding results demonstrate that under the clean-condition training setting, our proposed method can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE), in speech recognition.

UR - http://www.scopus.com/inward/record.url?scp=84986199905&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986199905&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2015.7415526

DO - 10.1109/APSIPA.2015.7415526

M3 - Conference contribution

AN - SCOPUS:84986199905

T3 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015

SP - 303

EP - 307

BT - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -