TY - JOUR
T1 - A comparative study of probabilistic ranking models for Chinese spoken document summarization
AU - Lin, Shih Hsiang
AU - Chen, Berlin
AU - Wang, Hsin Min
PY - 2009/3/1
Y1 - 2009/3/1
N2 - Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.
AB - Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.
KW - Extractive summarization
KW - Probabilistic ranking models
KW - Relevance information
KW - Spoken document summarization
UR - http://www.scopus.com/inward/record.url?scp=67149093166&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67149093166&partnerID=8YFLogxK
U2 - 10.1145/1482343.1482346
DO - 10.1145/1482343.1482346
M3 - Article
AN - SCOPUS:67149093166
SN - 1530-0226
VL - 8
JO - ACM Transactions on Asian Language Information Processing
JF - ACM Transactions on Asian Language Information Processing
IS - 1
M1 - 3
ER -