Mispronunciation detection leveraging maximum performance criterion training of acoustic models and decision functions

Yao Chi Hsu, Ming Han Yang, Hsiao Tsung Hung, Berlin Chen

Research output: Contribution to journalConference article

4 Citations (Scopus)

Abstract

Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Second, along the same vein, two disparate logistic sigmoid based decision functions with either phone- or senone-dependent parameterization are also inferred and used for enhanced mispronunciation detection. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed method.

Original languageEnglish
Pages (from-to)2646-2650
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
Publication statusPublished - 2016 Jan 1
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 2016 Sep 82016 Sep 16

Fingerprint

Acoustic Model
Parameterization
Logistics
Acoustics
Experiments
Veins
Continuation
Neural Networks
Metric
Series
Training
Deep neural networks
Mispronunciations
Dependent
Line
Evaluation
Estimate
Experiment

Keywords

  • Computer assisted pronunciation training
  • Deep neural networks
  • Discriminative training
  • Mispronunciation detection

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Mispronunciation detection leveraging maximum performance criterion training of acoustic models and decision functions. / Hsu, Yao Chi; Yang, Ming Han; Hung, Hsiao Tsung; Chen, Berlin.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 08-12-September-2016, 01.01.2016, p. 2646-2650.

Research output: Contribution to journalConference article

@article{0055a6bd67bd4f6495c62432bc70aea2,
title = "Mispronunciation detection leveraging maximum performance criterion training of acoustic models and decision functions",
abstract = "Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Second, along the same vein, two disparate logistic sigmoid based decision functions with either phone- or senone-dependent parameterization are also inferred and used for enhanced mispronunciation detection. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed method.",
keywords = "Computer assisted pronunciation training, Deep neural networks, Discriminative training, Mispronunciation detection",
author = "Hsu, {Yao Chi} and Yang, {Ming Han} and Hung, {Hsiao Tsung} and Berlin Chen",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-1602",
language = "English",
volume = "08-12-September-2016",
pages = "2646--2650",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Mispronunciation detection leveraging maximum performance criterion training of acoustic models and decision functions

AU - Hsu, Yao Chi

AU - Yang, Ming Han

AU - Hung, Hsiao Tsung

AU - Chen, Berlin

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Second, along the same vein, two disparate logistic sigmoid based decision functions with either phone- or senone-dependent parameterization are also inferred and used for enhanced mispronunciation detection. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed method.

AB - Mispronunciation detection is part and parcel of a computer assisted pronunciation training (CAPT) system, facilitating second-language (L2) learners to pinpoint erroneous pronunciations in a given utterance so as to improve their spoken proficiency. This paper presents a continuation of such a general line of research and the major contributions are twofold. First, we present an effective training approach that estimates the deep neural network based acoustic models involved in the mispronunciation detection process by optimizing an objective directly linked to the ultimate evaluation metric. Second, along the same vein, two disparate logistic sigmoid based decision functions with either phone- or senone-dependent parameterization are also inferred and used for enhanced mispronunciation detection. A series of experiments on a Mandarin mispronunciation detection task seem to show the performance merits of the proposed method.

KW - Computer assisted pronunciation training

KW - Deep neural networks

KW - Discriminative training

KW - Mispronunciation detection

UR - http://www.scopus.com/inward/record.url?scp=84994381532&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994381532&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-1602

DO - 10.21437/Interspeech.2016-1602

M3 - Conference article

AN - SCOPUS:84994381532

VL - 08-12-September-2016

SP - 2646

EP - 2650

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -