An Empirical Study on Transformer-Based End-to-End Speech Recognition with Novel Decoder Masking

Shi Yan Weng, Hsuan Sheng Chiu, Berlin Chen

研究成果: 書貢獻/報告類型會議論文篇章

摘要

The attention-based encoder-decoder modeling paradigm has achieved impressive success on a wide variety of speech and language processing tasks. This paradigm takes advantage of the innate ability of neural networks to learn a direct and streamlined mapping from an input sequence to an output sequence for ASR, without any prior knowledge like audio- alignments or pronunciation lexicons. An ASR model built on this paradigm, however, is inevitably faced with the issue of inadequate generalization especially when the model is not trained with huge amounts of speech data. In view of this, we in this paper propose a decoder masking based training approach for end-to-end (E2E) ASR models, taking inspiration from the celebrated speech input augmentation (viz. SpecAugment) and masked language modeling (viz. BERT). During the training phase, we randomly replace some portions of the decoder's historical input with the symbol [mask] to encourage the decoder to robustly output a correct token even when parts of its decoding history are masked. The proposed approach is instantiated with the top-of-the-line transformer-based E2E ASR model. Extensive experiments conducted on two benchmark datasets (viz. Librispeech960h and TedLium2) seem to demonstrate the efficacy of our approach in relation to some existing E2E ASR systems.

原文英語
主出版物標題2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面518-522
頁數5
ISBN(電子)9789881476890
出版狀態已發佈 - 2021
事件2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, 日本
持續時間: 2021 12月 142021 12月 17

出版系列

名字2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

會議

會議2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
國家/地區日本
城市Tokyo
期間2021/12/142021/12/17

ASJC Scopus subject areas

  • 人工智慧
  • 電腦視覺和模式識別
  • 訊號處理
  • 儀器

指紋

深入研究「An Empirical Study on Transformer-Based End-to-End Speech Recognition with Novel Decoder Masking」主題。共同形成了獨特的指紋。

引用此