TY - JOUR
T1 - Robust speech recognition via enhancing the complex-valued acoustic spectrum in modulation domain
AU - Hung, Jeih Weih
AU - Hsieh, Hsin Ju
AU - Chen, Berlin
N1 - Publisher Copyright:
©2015 IEEE.
PY - 2016/2
Y1 - 2016/2
N2 - The purpose of this paper is to develop a novel speech feature extraction framework for independently compensating the real and imaginary acoustic spectra of speech signals in the modulation domain with the techniques of histogram equalization (HEQ) and non-negative matrix factorization (NMF). By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, thereby creating noise-robust speech features. More specifically, the proposed framework makes the following three major contributions: First, via either of the HEQ and NMF operations, the long-term cross-frame correlation among the acoustic spectra at the same frequency can be captured to compensate for the spectral distortion caused by noise. Second, the noise effect can be handled in a high acoustic frequency resolution. Finally, the distortion dwelt in the acoustic spectra can be more extensively mitigated due to the independent processes for the respective real and imaginary parts. The evaluation experiments were carried out on the Aurora-2 and Aurora-4 benchmark tasks, and the corresponding results suggest that our proposed methods can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme, in speech recognition.
AB - The purpose of this paper is to develop a novel speech feature extraction framework for independently compensating the real and imaginary acoustic spectra of speech signals in the modulation domain with the techniques of histogram equalization (HEQ) and non-negative matrix factorization (NMF). By doing so, we can enhance not only the magnitude but also the phase components of the acoustic spectra, thereby creating noise-robust speech features. More specifically, the proposed framework makes the following three major contributions: First, via either of the HEQ and NMF operations, the long-term cross-frame correlation among the acoustic spectra at the same frequency can be captured to compensate for the spectral distortion caused by noise. Second, the noise effect can be handled in a high acoustic frequency resolution. Finally, the distortion dwelt in the acoustic spectra can be more extensively mitigated due to the independent processes for the respective real and imaginary parts. The evaluation experiments were carried out on the Aurora-2 and Aurora-4 benchmark tasks, and the corresponding results suggest that our proposed methods can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE) extraction scheme, in speech recognition.
KW - Automatic speech recognition (ASR)
KW - Feature extraction
KW - Histogram equalization (HEQ)
KW - Modulation spectrum
KW - Noise robustness
KW - Non-negative matrix factorization (NMF)
UR - http://www.scopus.com/inward/record.url?scp=84962886232&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962886232&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2015.2504781
DO - 10.1109/TASLP.2015.2504781
M3 - Article
AN - SCOPUS:84962886232
SN - 2329-9290
VL - 24
SP - 236
EP - 251
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
IS - 2
ER -