TY - GEN
T1 - Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+
AU - Tsao, Yu Sheng
AU - Chen, Berlin
AU - Hung, Jeih Weih
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.
AB - The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.
KW - FullSubNet
KW - FullSubNet+
KW - deep learning
KW - discrete cosine transform
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85145353355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85145353355&partnerID=8YFLogxK
U2 - 10.1109/IET-ICETA56553.2022.9971683
DO - 10.1109/IET-ICETA56553.2022.9971683
M3 - Conference contribution
AN - SCOPUS:85145353355
T3 - Proceedings - 2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022
BT - Proceedings - 2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IET International Conference on Engineering Technologies and Applications, IET-ICETA 2022
Y2 - 14 October 2022 through 16 October 2022
ER -