Modulation spectrum augmentation for robust speech recognition

Bi Cheng Yan*, Shih Hung Liu, Berlin Chen


研究成果: 書貢獻/報告類型會議論文篇章


Data augmentation is a crucial mechanism being employed to increase the diversity of training data in order to avoid overfitting and improve robustness of statistical models in various applications. In the context of automatic speech recognition (ASR), a recent trend has been to develop effective methods to augment training speech data by warping or masking utterances based on their waveforms or spectrograms. Extending this line of research, we make attempts to explore novel ways to generate augmented training speech data, in comparison to the existing state-of-the-art approaches. The main contribution of this paper is at least two-fold. First, we propose to warp the intermediate representation of the cepstral feature vector sequence of an utterance in a holistic manner. This intermediate representation can be embodied in different modulation domains by performing discrete Fourier transform (DFT) along the either the time- or the component-axis of a cepstral feature vector sequence. Second, we also develop a two-stage augmentation approach, which successively conduct perturbation in the waveform domain and warping in different modulation domains of cepstral speech feature vector sequences, to further enhance robustness. A series of experiments are carried out on the Aurora-4 database and task, in conjunction with a typical DNN-HMM based ASR system. The proposed augmentation method that conducts warping in the component-axis modulation domain of cepstral feature vector sequences can yield a word error rate reduction (WERR) of 17.6% and 0.69%, respectively, for the clean- and multi-condition training settings. In addition, the proposed two-stage augmentation method can at best achieve a WERR of 1.13% when using the multi-condition training setup.

主出版物標題Proceedings of the International Conference on Advanced Information Science and System, AISS 2019
發行者Association for Computing Machinery
出版狀態已發佈 - 2019 11月 15
事件2019 International Conference on Advanced Information Science and System, AISS 2019 - Singapore, 新加坡
持續時間: 2019 11月 152019 11月 17


名字ACM International Conference Proceeding Series


會議2019 International Conference on Advanced Information Science and System, AISS 2019

ASJC Scopus subject areas

  • 軟體
  • 人機介面
  • 電腦視覺和模式識別
  • 電腦網路與通信


深入研究「Modulation spectrum augmentation for robust speech recognition」主題。共同形成了獨特的指紋。