TY - GEN
T1 - A unified probabilistic generative framework for extractive spoken document summarization
AU - Chen, Yi Ting
AU - Chiu, Hsuan Sheng
AU - Wang, Hsin Min
AU - Chen, Berlin
PY - 2007
Y1 - 2007
N2 - In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different matching strategies, i.e., literal term matching and concept matching, were extensively investigated. We explored the use of the hidden Markov model (HMM) and relevance model (RM) for literal term matching, while the word topical mixture model (WTMM) for concept matching. On the other hand, the confidence scores, structural features, and a set of prosodic features were properly incorporated together using the whole sentence maximum entropy model (WSME) for the estimation of the sentence prior probability. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.
AB - In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different matching strategies, i.e., literal term matching and concept matching, were extensively investigated. We explored the use of the hidden Markov model (HMM) and relevance model (RM) for literal term matching, while the word topical mixture model (WTMM) for concept matching. On the other hand, the confidence scores, structural features, and a set of prosodic features were properly incorporated together using the whole sentence maximum entropy model (WSME) for the estimation of the sentence prior probability. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.
KW - Hidden markov model
KW - Relevance model
KW - Spoken document summarization
KW - Whole sentence maximum entropy model
KW - Word topical mixture model
UR - http://www.scopus.com/inward/record.url?scp=56149127090&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=56149127090&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:56149127090
SN - 9781605603162
T3 - International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
SP - 2528
EP - 2531
BT - International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
T2 - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Y2 - 27 August 2007 through 31 August 2007
ER -