Training data selection for improving discriminative training of acoustic models

Shih Hung Liu, Fang Hui Chu, Shih Hsiang Lin, Hung Shin Lee, Berlin Chen

Research output: Contribution to conferencePaper

11 Citations (Scopus)

Abstract

This paper considers training data selection for discriminative training of acoustic models for broadcast news speech recognition. Three novel data selection approaches were proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance was utilized for utterancelevel data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice was investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice was explored. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the standard discriminative training approaches. Experiments conducted on the Mandarin broadcast news collected in Taiwan shown that both phone- and frame-level data selection could achieve slight but consistent improvements over the baseline systems at lower training iterations.

Original languageEnglish
Pages284-289
Number of pages6
Publication statusPublished - 2007 Dec 1
Event2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan
Duration: 2007 Dec 92007 Dec 13

Other

Other2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007
CountryJapan
CityKyoto
Period07/12/907/12/13

Fingerprint

Speech recognition
Entropy
Acoustics
Experiments

Keywords

  • Acoustic models
  • Data selection
  • Discriminative training
  • Entropy
  • Speech recognition

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Software
  • Artificial Intelligence

Cite this

Liu, S. H., Chu, F. H., Lin, S. H., Lee, H. S., & Chen, B. (2007). Training data selection for improving discriminative training of acoustic models. 284-289. Paper presented at 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Kyoto, Japan.

Training data selection for improving discriminative training of acoustic models. / Liu, Shih Hung; Chu, Fang Hui; Lin, Shih Hsiang; Lee, Hung Shin; Chen, Berlin.

2007. 284-289 Paper presented at 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Kyoto, Japan.

Research output: Contribution to conferencePaper

Liu, SH, Chu, FH, Lin, SH, Lee, HS & Chen, B 2007, 'Training data selection for improving discriminative training of acoustic models', Paper presented at 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Kyoto, Japan, 07/12/9 - 07/12/13 pp. 284-289.
Liu SH, Chu FH, Lin SH, Lee HS, Chen B. Training data selection for improving discriminative training of acoustic models. 2007. Paper presented at 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Kyoto, Japan.
Liu, Shih Hung ; Chu, Fang Hui ; Lin, Shih Hsiang ; Lee, Hung Shin ; Chen, Berlin. / Training data selection for improving discriminative training of acoustic models. Paper presented at 2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007, Kyoto, Japan.6 p.
@conference{c7d4fff6f6f54216bbc6718e753a7929,
title = "Training data selection for improving discriminative training of acoustic models",
abstract = "This paper considers training data selection for discriminative training of acoustic models for broadcast news speech recognition. Three novel data selection approaches were proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance was utilized for utterancelevel data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice was investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice was explored. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the standard discriminative training approaches. Experiments conducted on the Mandarin broadcast news collected in Taiwan shown that both phone- and frame-level data selection could achieve slight but consistent improvements over the baseline systems at lower training iterations.",
keywords = "Acoustic models, Data selection, Discriminative training, Entropy, Speech recognition",
author = "Liu, {Shih Hung} and Chu, {Fang Hui} and Lin, {Shih Hsiang} and Lee, {Hung Shin} and Berlin Chen",
year = "2007",
month = "12",
day = "1",
language = "English",
pages = "284--289",
note = "2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 ; Conference date: 09-12-2007 Through 13-12-2007",

}

TY - CONF

T1 - Training data selection for improving discriminative training of acoustic models

AU - Liu, Shih Hung

AU - Chu, Fang Hui

AU - Lin, Shih Hsiang

AU - Lee, Hung Shin

AU - Chen, Berlin

PY - 2007/12/1

Y1 - 2007/12/1

N2 - This paper considers training data selection for discriminative training of acoustic models for broadcast news speech recognition. Three novel data selection approaches were proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance was utilized for utterancelevel data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice was investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice was explored. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the standard discriminative training approaches. Experiments conducted on the Mandarin broadcast news collected in Taiwan shown that both phone- and frame-level data selection could achieve slight but consistent improvements over the baseline systems at lower training iterations.

AB - This paper considers training data selection for discriminative training of acoustic models for broadcast news speech recognition. Three novel data selection approaches were proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance was utilized for utterancelevel data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice was investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice was explored. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the standard discriminative training approaches. Experiments conducted on the Mandarin broadcast news collected in Taiwan shown that both phone- and frame-level data selection could achieve slight but consistent improvements over the baseline systems at lower training iterations.

KW - Acoustic models

KW - Data selection

KW - Discriminative training

KW - Entropy

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=44849114709&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44849114709&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:44849114709

SP - 284

EP - 289

ER -