TY - GEN
T1 - Exploring Feature Enhancement in the Modulation Spectrum Domain via Ideal Ratio Mask for Robust Speech Recognition
AU - Yan, Bi Cheng
AU - Wu, Meng Che
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2020 APSIPA.
PY - 2020/12/7
Y1 - 2020/12/7
N2 - Development of robustness techniques is of paramount importance to the success of automatic speech recognition (ASR) systems. In this paper, we present a novel use of the ideal ratio mask (IRM) method to improve ASR robustness. IRM was originally proposed for time-frequency (T-F) masking-based speech enhancement and has shown considerable promise in preserving the intelligibility of a noisy mixture signal. Further, IRM is alternatively used to normalize the intermediate representations of speech feature vector sequences, in a holistic manner, for both training and test utterances. Finally, we instead treat IRM as a data augmentation method, conducted on speech feature vectors of training utterances or their intermediate representations, to generate additional augmented data for increasing the diversity of training data. A series of experiments carried out on the standard Aurora-4 database and task confirm the effectiveness of our methods.
AB - Development of robustness techniques is of paramount importance to the success of automatic speech recognition (ASR) systems. In this paper, we present a novel use of the ideal ratio mask (IRM) method to improve ASR robustness. IRM was originally proposed for time-frequency (T-F) masking-based speech enhancement and has shown considerable promise in preserving the intelligibility of a noisy mixture signal. Further, IRM is alternatively used to normalize the intermediate representations of speech feature vector sequences, in a holistic manner, for both training and test utterances. Finally, we instead treat IRM as a data augmentation method, conducted on speech feature vectors of training utterances or their intermediate representations, to generate additional augmented data for increasing the diversity of training data. A series of experiments carried out on the standard Aurora-4 database and task confirm the effectiveness of our methods.
UR - http://www.scopus.com/inward/record.url?scp=85100940220&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100940220&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85100940220
T3 - 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
SP - 759
EP - 763
BT - 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Y2 - 7 December 2020 through 10 December 2020
ER -