The NTNU system at the interspeech 2020 non-native Children's speech ASR challenge

Tien Hong Lo, Fu An Chao, Shi Yan Weng, Berlin Chen

研究成果: 雜誌貢獻會議論文同行評審

7 引文 斯高帕斯(Scopus)

摘要

This paper describes the NTNU ASR system participating in the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA. This ASR shared task is made much more challenging due to the coexisting diversity of non-native and children speaking characteristics. In the setting of closed-track evaluation, all participants were restricted to develop their systems merely based on the speech and text corpora provided by the organizer. To work around this under-resourced issue, we built our ASR system on top of CNN-TDNNF-based acoustic models, meanwhile harnessing the synergistic power of various data augmentation strategies, including both utterance- and word-level speed perturbation and spectrogram augmentation, alongside a simple yet effective data-cleansing approach. All variants of our ASR system employed an RNN-based language model to rescore the first-pass recognition hypotheses, which was trained solely on the text dataset released by the organizer. Our system with the best configuration came out in second place, resulting in a word error rate (WER) of 17.59 %, while those of the top-performing, second runner-up and official baseline systems are 15.67%, 18.71%, 35.09%, respectively.

原文英語
頁(從 - 到)250-254
頁數5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2020-October
DOIs
出版狀態已發佈 - 2020
事件21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, 中国
持續時間: 2020 10月 252020 10月 29

ASJC Scopus subject areas

  • 語言與語言學
  • 人機介面
  • 訊號處理
  • 軟體
  • 建模與模擬

指紋

深入研究「The NTNU system at the interspeech 2020 non-native Children's speech ASR challenge」主題。共同形成了獨特的指紋。

引用此