TY - GEN
T1 - Enhancing the complex-valued acoustic spectrograms in modulation domain for creating noise-robust features in speech recognition
AU - Hsieh, Hsin Ju
AU - Chen, Berlin
AU - Hung, Jeih Weih
N1 - Publisher Copyright:
© 2015 Asia-Pacific Signal and Information Processing Association.
PY - 2016/2/19
Y1 - 2016/2/19
N2 - In this paper, we propose a speech enhancement technique which compensates for the real and imaginary acoustic spectrograms separately. This technique leverages principal component analysis (PCA) to highlight the clean speech components of the modulation spectra for noise-corrupted acoustic spectrograms. By doing so, we can enhance not only the magnitude but also the phase portions of the complex-valued acoustic spectrogram, thereby creating noise-robust speech features. More particularly, the proposed technique possesses two explicit merits. First, via the operation on modulation domain, the long-term cross-time correlation among the acoustic spectrogram can be captured and subsequently employed to compensate for the spectral distortion caused by noise. Next, due to the individual processing of real and imaginary acoustic spectrograms, the proposed method will not encounter a knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods especially when the noise reduction process is inevitable. All of the evaluation experiments are conducted on the Aurora-2 and Aurora-4 databases and tasks. The corresponding results demonstrate that under the clean-condition training setting, our proposed method can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE), in speech recognition.
AB - In this paper, we propose a speech enhancement technique which compensates for the real and imaginary acoustic spectrograms separately. This technique leverages principal component analysis (PCA) to highlight the clean speech components of the modulation spectra for noise-corrupted acoustic spectrograms. By doing so, we can enhance not only the magnitude but also the phase portions of the complex-valued acoustic spectrogram, thereby creating noise-robust speech features. More particularly, the proposed technique possesses two explicit merits. First, via the operation on modulation domain, the long-term cross-time correlation among the acoustic spectrogram can be captured and subsequently employed to compensate for the spectral distortion caused by noise. Next, due to the individual processing of real and imaginary acoustic spectrograms, the proposed method will not encounter a knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods especially when the noise reduction process is inevitable. All of the evaluation experiments are conducted on the Aurora-2 and Aurora-4 databases and tasks. The corresponding results demonstrate that under the clean-condition training setting, our proposed method can achieve performance competitive to or better than many widely used noise robustness methods, including the well-known advanced front-end (AFE), in speech recognition.
UR - http://www.scopus.com/inward/record.url?scp=84986199905&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986199905&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2015.7415526
DO - 10.1109/APSIPA.2015.7415526
M3 - Conference contribution
AN - SCOPUS:84986199905
T3 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
SP - 303
EP - 307
BT - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
Y2 - 16 December 2015 through 19 December 2015
ER -