TY - GEN
T1 - Investigating data selection for minimum phone error training of acoustic models
AU - Liu, Shih Hung
AU - Chu, Fang Hui
AU - Lin, Shih Hsiang
AU - Chen, Berlin
PY - 2007
Y1 - 2007
N2 - This paper considers minimum phone error (MPE) Dased discriminative training of acoustic models for Mandarin broadcast news recognition. A novel data selection approach based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was explored. It has the merit of making the training algorithm focus much more on the training statistics of those frame samples that center nearly around the decision boundary for better discrimination. Moreover, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of MPE training. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the original MPE training approach. Experiments conducted on the broadcast news collected in Taiwan showed that the integration of the frame-level data selection and accuracy calculation could achieve slight but consistent improvements over the baseline system.
AB - This paper considers minimum phone error (MPE) Dased discriminative training of acoustic models for Mandarin broadcast news recognition. A novel data selection approach based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was explored. It has the merit of making the training algorithm focus much more on the training statistics of those frame samples that center nearly around the decision boundary for better discrimination. Moreover, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of MPE training. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the original MPE training approach. Experiments conducted on the broadcast news collected in Taiwan showed that the integration of the frame-level data selection and accuracy calculation could achieve slight but consistent improvements over the baseline system.
UR - http://www.scopus.com/inward/record.url?scp=46449108334&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=46449108334&partnerID=8YFLogxK
U2 - 10.1109/icme.2007.4284658
DO - 10.1109/icme.2007.4284658
M3 - Conference contribution
AN - SCOPUS:46449108334
SN - 1424410177
SN - 9781424410170
T3 - Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
SP - 348
EP - 351
BT - Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
PB - IEEE Computer Society
T2 - IEEE International Conference onMultimedia and Expo, ICME 2007
Y2 - 2 July 2007 through 5 July 2007
ER -