融合多種深層類神經網路聲學模型與分類技術於華語錯誤發音檢測之研究

Translated title of the contribution: Exploring combinations of various deep neural network based acoustic models and classification techniques for Mandarin mispronunciation detection

Yao Chi Hsu, Ming Han Yang, Hsiao Tsung Hung, Yuwen Hsiung, Yao Ting Sung, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic mispronunciation detection plays a crucial role in a computer assisted pronunciation training (CAPT) system. The main purpose of mispronunciation detection is to judge whether the pronunciations of a non-native speaker are correct or not. In general, the process of mispronunciation detection can be divided into two parts: 1) a front-end feature extraction module that generates pronunciation detection features based on an input speech segment and its associated reference acoustic models; and 2) a back-end classification module that determines the correctness of the pronunciation of the speech segment according to the output of a classifier that takes the pronunciation detection features of the segment as the input. The main contributions of this work are three-fold. First, we investigate the use of two state-of-the-art acoustic models, respectively based on deep neural networks (DNN) and convolutional neural networks (CNN), and compare their effectiveness for the extraction of discriminative pronunciation detection features. Second, we experiment with different types of classification methods and propose a novel integration of DNN- and CNN-based decision scores at the back-end. Third, we provide an extensive set of empirical evaluations on the aforementioned two modules and associated methods based on a recently compiled corpus for learning Mandarin Chinese as the second language. The experimental results reveal the performance utility of our approach in relation to several existing baselines.

Translated title of the contributionExploring combinations of various deep neural network based acoustic models and classification techniques for Mandarin mispronunciation detection
Original languageChinese
Title of host publicationProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
EditorsSin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages103-120
Number of pages18
ISBN (Electronic)9789573079286
Publication statusPublished - 2015 Oct 1
Event27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, Taiwan
Duration: 2015 Oct 12015 Oct 2

Publication series

NameProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015

Conference

Conference27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
CountryTaiwan
CityHsinchu
Period15/10/115/10/2

ASJC Scopus subject areas

  • Speech and Hearing
  • Language and Linguistics

Fingerprint Dive into the research topics of 'Exploring combinations of various deep neural network based acoustic models and classification techniques for Mandarin mispronunciation detection'. Together they form a unique fingerprint.

  • Cite this

    Hsu, Y. C., Yang, M. H., Hung, H. T., Hsiung, Y., Sung, Y. T., & Chen, B. (2015). 融合多種深層類神經網路聲學模型與分類技術於華語錯誤發音檢測之研究. In S-H. Chen, H-M. Wang, J-T. Chien, H-Y. Kao, W-W. Chang, Y-R. Wang, & S-H. Wu (Eds.), Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 (pp. 103-120). (Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015). The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).