A comparative study of probabilistic ranking models for Chinese spoken document summarization

Shih Hsiang Lin, Berlin Chen*, Hsin Min Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

27 Citations (Scopus)

Abstract

Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.

Original languageEnglish
Article number3
JournalACM Transactions on Asian Language Information Processing
Volume8
Issue number1
DOIs
Publication statusPublished - 2009 Mar 1

Keywords

  • Extractive summarization
  • Probabilistic ranking models
  • Relevance information
  • Spoken document summarization

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'A comparative study of probabilistic ranking models for Chinese spoken document summarization'. Together they form a unique fingerprint.

Cite this