A unified probabilistic generative framework for extractive spoken document summarization

Yi Ting Chen*, Hsuan Sheng Chiu, Hsin Min Wang, Berlin Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we consider extractive summarization of Chinese broadcast news speech. A unified probabilistic generative framework that combined the sentence generative probability and the sentence prior probability for sentence ranking was proposed. Each sentence of a spoken document to be summarized was treated as a probabilistic generative model for predicting the document. Two different matching strategies, i.e., literal term matching and concept matching, were extensively investigated. We explored the use of the hidden Markov model (HMM) and relevance model (RM) for literal term matching, while the word topical mixture model (WTMM) for concept matching. On the other hand, the confidence scores, structural features, and a set of prosodic features were properly incorporated together using the whole sentence maximum entropy model (WSME) for the estimation of the sentence prior probability. The experiments were performed on the Chinese broadcast news collected in Taiwan. Very promising and encouraging results were initially obtained.

Original languageEnglish
Title of host publicationInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Pages2528-2531
Number of pages4
Publication statusPublished - 2007
Event8th Annual Conference of the International Speech Communication Association, Interspeech 2007 - Antwerp, Belgium
Duration: 2007 Aug 272007 Aug 31

Publication series

NameInternational Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Volume4

Other

Other8th Annual Conference of the International Speech Communication Association, Interspeech 2007
Country/TerritoryBelgium
CityAntwerp
Period2007/08/272007/08/31

Keywords

  • Hidden markov model
  • Relevance model
  • Spoken document summarization
  • Whole sentence maximum entropy model
  • Word topical mixture model

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Modelling and Simulation
  • Linguistics and Language
  • Communication

Fingerprint

Dive into the research topics of 'A unified probabilistic generative framework for extractive spoken document summarization'. Together they form a unique fingerprint.

Cite this