A comparative study of probabilistic ranking models for Chinese spoken document summarization

Shih Hsiang Lin, Berlin Chen, Hsin Min Wang

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.

Original languageEnglish
Article number3
JournalACM Transactions on Asian Language Information Processing
Volume8
Issue number1
DOIs
Publication statusPublished - 2009 Mar 1

Fingerprint

Labels
Statistical Models

Keywords

  • Extractive summarization
  • Probabilistic ranking models
  • Relevance information
  • Spoken document summarization

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

A comparative study of probabilistic ranking models for Chinese spoken document summarization. / Lin, Shih Hsiang; Chen, Berlin; Wang, Hsin Min.

In: ACM Transactions on Asian Language Information Processing, Vol. 8, No. 1, 3, 01.03.2009.

Research output: Contribution to journalArticle

@article{513bed270b4d4892963756f70fe5016e,
title = "A comparative study of probabilistic ranking models for Chinese spoken document summarization",
abstract = "Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.",
keywords = "Extractive summarization, Probabilistic ranking models, Relevance information, Spoken document summarization",
author = "Lin, {Shih Hsiang} and Berlin Chen and Wang, {Hsin Min}",
year = "2009",
month = "3",
day = "1",
doi = "10.1145/1482343.1482346",
language = "English",
volume = "8",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - A comparative study of probabilistic ranking models for Chinese spoken document summarization

AU - Lin, Shih Hsiang

AU - Chen, Berlin

AU - Wang, Hsin Min

PY - 2009/3/1

Y1 - 2009/3/1

N2 - Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.

AB - Extractive document summarization automatically selects a number of indicative sentences, passages, or paragraphs from an original document according to a target summarization ratio, and sequences them to form a concise summary. In this article, we present a comparative study of various probabilistic ranking models for spoken document summarization, including supervised classification-based summarizers and unsupervised probabilistic generative summarizers. We also investigate the use of unsupervised summarizers to improve the performance of supervised summarizers when manual labels are not available for training the latter. A novel training data selection approach that leverages the relevance information of spoken sentences to select reliable document-summary pairs derived by the probabilistic generative summarizers is explored for training the classification-based summarizers. Encouraging initial results on Mandarin Chinese broadcast news data are demonstrated.

KW - Extractive summarization

KW - Probabilistic ranking models

KW - Relevance information

KW - Spoken document summarization

UR - http://www.scopus.com/inward/record.url?scp=67149093166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67149093166&partnerID=8YFLogxK

U2 - 10.1145/1482343.1482346

DO - 10.1145/1482343.1482346

M3 - Article

AN - SCOPUS:67149093166

VL - 8

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 1

M1 - 3

ER -