Extractive spoken document summarization for information retrieval

Berlin Chen, Yi Ting Chen

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In this paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with several conventional spoken document summarization models. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of Mandarin broadcast news via mobile devices.

Original languageEnglish
Pages (from-to)426-437
Number of pages12
JournalPattern Recognition Letters
Volume29
Issue number4
DOIs
Publication statusPublished - 2008 Mar 1

Fingerprint

Information retrieval
Mobile devices
Experiments

Keywords

  • Extractive summarization
  • Information retrieval
  • Speech recognition
  • Spoken documents
  • Topical mixture model

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Extractive spoken document summarization for information retrieval. / Chen, Berlin; Chen, Yi Ting.

In: Pattern Recognition Letters, Vol. 29, No. 4, 01.03.2008, p. 426-437.

Research output: Contribution to journalArticle

@article{93153736a1be4827a101f2349aacc2d2,
title = "Extractive spoken document summarization for information retrieval",
abstract = "The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In this paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with several conventional spoken document summarization models. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of Mandarin broadcast news via mobile devices.",
keywords = "Extractive summarization, Information retrieval, Speech recognition, Spoken documents, Topical mixture model",
author = "Berlin Chen and Chen, {Yi Ting}",
year = "2008",
month = "3",
day = "1",
doi = "10.1016/j.patrec.2007.10.022",
language = "English",
volume = "29",
pages = "426--437",
journal = "Pattern Recognition Letters",
issn = "0167-8655",
publisher = "Elsevier",
number = "4",

}

TY - JOUR

T1 - Extractive spoken document summarization for information retrieval

AU - Chen, Berlin

AU - Chen, Yi Ting

PY - 2008/3/1

Y1 - 2008/3/1

N2 - The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In this paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with several conventional spoken document summarization models. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of Mandarin broadcast news via mobile devices.

AB - The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In this paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with several conventional spoken document summarization models. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of Mandarin broadcast news via mobile devices.

KW - Extractive summarization

KW - Information retrieval

KW - Speech recognition

KW - Spoken documents

KW - Topical mixture model

UR - http://www.scopus.com/inward/record.url?scp=38349073318&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38349073318&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2007.10.022

DO - 10.1016/j.patrec.2007.10.022

M3 - Article

AN - SCOPUS:38349073318

VL - 29

SP - 426

EP - 437

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

SN - 0167-8655

IS - 4

ER -