Training data selection for improving discriminative training of acoustic models

Shih Hung Liu, Fang Hui Chu, Shih Hsiang Lin, Hung Shin Lee, Berlin Chen

Research output: Contribution to conferencePaperpeer-review

12 Citations (Scopus)

Abstract

This paper considers training data selection for discriminative training of acoustic models for broadcast news speech recognition. Three novel data selection approaches were proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance was utilized for utterancelevel data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice was investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice was explored. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the standard discriminative training approaches. Experiments conducted on the Mandarin broadcast news collected in Taiwan shown that both phone- and frame-level data selection could achieve slight but consistent improvements over the baseline systems at lower training iterations.

Original languageEnglish
Pages284-289
Number of pages6
Publication statusPublished - 2007 Dec 1
Event2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007 - Kyoto, Japan
Duration: 2007 Dec 92007 Dec 13

Other

Other2007 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2007
CountryJapan
CityKyoto
Period07/12/907/12/13

Keywords

  • Acoustic models
  • Data selection
  • Discriminative training
  • Entropy
  • Speech recognition

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Training data selection for improving discriminative training of acoustic models'. Together they form a unique fingerprint.

Cite this