A comparative study of probabilistic ranking models for Chinese spoken document summarization

Shih Hsiang Lin, Berlin Chen, Hsin Min Wang

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.

Original languageEnglish
Article number3
JournalACM Transactions on Asian Language Information Processing
Volume8
Issue number1
DOIs
Publication statusPublished - 2009 Mar 1

    Fingerprint

Keywords

  • Extractive summarization
  • Probabilistic ranking models
  • Relevance information
  • Spoken document summarization

ASJC Scopus subject areas

  • Computer Science(all)

Cite this