TY - GEN
T1 - Employing median filtering to enhance the complex-valued acoustic spectrograms in modulation domain for noise-robust speech recognition
AU - Hsieh, Hsin Ju
AU - Chen, Berlin
AU - Hung, Jeih Weih
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/5/2
Y1 - 2017/5/2
N2 - In this paper, we propose to employ median filtering (MF) to the modulation domain of the complex-valued acoustic spectrogram in order to alleviate the noise effect in speech signals and thereby improve noise robustness. Median filtering is well known by its excellent capability of removing the speckle noise in data while preserving the embedded sharp contrasts. When median filtering is applied to the temporal modulation spectrum, which is the Fourier transform for either of real and imaginary acoustic spectrograms along the time axis, we find that the mismatch caused by noise can be significantly reduced, and the resulting speech features can be more noise-robust and provide better accuracy for speech recognition in comparison with the original unprocessed features. More particularly, the proposed method possesses three explicit merits. First, via the median filtering operation, the outliers residing in the modulation spectrum probably caused by noise can be substantially alleviated. Second, by virtue of the individual processing of real and imaginary acoustic spectrograms, the proposed method will not experience the knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods, especially when the noise reduction process is inevitable. Third, the median filtering process is completely unsupervised and requires no prior information about the clean speech and noise. All of the evaluation experiments are conducted on the two databases, the connected-digit Aurora-2 database and the median-vocabulary Aurora-4 database. According to the recognition results, we demonstrate that the proposed MF-based method can achieve performance competitive to or better than many state-of-the-art noise robustness methods, including histogram equalization (HEQ), mean and variance normalization (MVN), MVN plus ARMA filtering (MVA), temporal structure normalization (TSN) and advanced front-end (AFE).
AB - In this paper, we propose to employ median filtering (MF) to the modulation domain of the complex-valued acoustic spectrogram in order to alleviate the noise effect in speech signals and thereby improve noise robustness. Median filtering is well known by its excellent capability of removing the speckle noise in data while preserving the embedded sharp contrasts. When median filtering is applied to the temporal modulation spectrum, which is the Fourier transform for either of real and imaginary acoustic spectrograms along the time axis, we find that the mismatch caused by noise can be significantly reduced, and the resulting speech features can be more noise-robust and provide better accuracy for speech recognition in comparison with the original unprocessed features. More particularly, the proposed method possesses three explicit merits. First, via the median filtering operation, the outliers residing in the modulation spectrum probably caused by noise can be substantially alleviated. Second, by virtue of the individual processing of real and imaginary acoustic spectrograms, the proposed method will not experience the knotty problem of speech-noise cross-term that usually exists in the conventional acoustic spectral enhancement methods, especially when the noise reduction process is inevitable. Third, the median filtering process is completely unsupervised and requires no prior information about the clean speech and noise. All of the evaluation experiments are conducted on the two databases, the connected-digit Aurora-2 database and the median-vocabulary Aurora-4 database. According to the recognition results, we demonstrate that the proposed MF-based method can achieve performance competitive to or better than many state-of-the-art noise robustness methods, including histogram equalization (HEQ), mean and variance normalization (MVN), MVN plus ARMA filtering (MVA), temporal structure normalization (TSN) and advanced front-end (AFE).
KW - Automatic speech recognition
KW - Feature extraction
KW - Median filter
KW - Modulation spectrum
KW - Noise robustness
KW - Principal component analysis
UR - http://www.scopus.com/inward/record.url?scp=85020176025&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020176025&partnerID=8YFLogxK
U2 - 10.1109/ISCSLP.2016.7918403
DO - 10.1109/ISCSLP.2016.7918403
M3 - Conference contribution
AN - SCOPUS:85020176025
T3 - Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
BT - Proceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
A2 - Wang, Hsin-Min
A2 - Hou, Qingzhi
A2 - Wei, Yuan
A2 - Lee, Tan
A2 - Wei, Jianguo
A2 - Xie, Lei
A2 - Feng, Hui
A2 - Dang, Jianwu
A2 - Dang, Jianwu
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Y2 - 17 October 2016 through 20 October 2016
ER -