Modulation spectrum augmentation for robust speech recognition

Bi Cheng Yan, Shih Hung Liu, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data augmentation is a crucial mechanism being employed to increase the diversity of training data in order to avoid overfitting and improve robustness of statistical models in various applications. In the context of automatic speech recognition (ASR), a recent trend has been to develop effective methods to augment training speech data by warping or masking utterances based on their waveforms or spectrograms. Extending this line of research, we make attempts to explore novel ways to generate augmented training speech data, in comparison to the existing state-of-the-art approaches. The main contribution of this paper is at least two-fold. First, we propose to warp the intermediate representation of the cepstral feature vector sequence of an utterance in a holistic manner. This intermediate representation can be embodied in different modulation domains by performing discrete Fourier transform (DFT) along the either the time- or the component-axis of a cepstral feature vector sequence. Second, we also develop a two-stage augmentation approach, which successively conduct perturbation in the waveform domain and warping in different modulation domains of cepstral speech feature vector sequences, to further enhance robustness. A series of experiments are carried out on the Aurora-4 database and task, in conjunction with a typical DNN-HMM based ASR system. The proposed augmentation method that conducts warping in the component-axis modulation domain of cepstral feature vector sequences can yield a word error rate reduction (WERR) of 17.6% and 0.69%, respectively, for the clean- and multi-condition training settings. In addition, the proposed two-stage augmentation method can at best achieve a WERR of 1.13% when using the multi-condition training setup.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Advanced Information Science and System, AISS 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450372916
DOIs
Publication statusPublished - 2019 Nov 15
Event2019 International Conference on Advanced Information Science and System, AISS 2019 - Singapore, Singapore
Duration: 2019 Nov 152019 Nov 17

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 International Conference on Advanced Information Science and System, AISS 2019
CountrySingapore
CitySingapore
Period19/11/1519/11/17

Keywords

  • Data augmentation
  • Modulation spectra
  • Robustness
  • Speech recognition

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint Dive into the research topics of 'Modulation spectrum augmentation for robust speech recognition'. Together they form a unique fingerprint.

  • Cite this

    Yan, B. C., Liu, S. H., & Chen, B. (2019). Modulation spectrum augmentation for robust speech recognition. In Proceedings of the International Conference on Advanced Information Science and System, AISS 2019 (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3373477.3373695