Word topical mixture models for extractive spoken document summarization

Berlin Chen*, Yi Ting Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper considers extractive summarization of Chinese spoken documents. In contrast to conventional approaches, we attempt to deal with the extractive summarization problem under a probabilistic generative framework. A word topical mixture model (w-TMM) was proposed to explore the cooccurrence relationship between words of the language. Each sentence of the spoken document to be summarized was treated as a composite word TMM model for generating the document, and sentences were ranked and selected according to their likelihoods. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the other conventional summarization approaches. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of broadcast news via mobile devices.

Original languageEnglish
Title of host publicationProceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007
PublisherIEEE Computer Society
Pages52-55
Number of pages4
ISBN (Print)1424410177, 9781424410170
DOIs
Publication statusPublished - 2007
EventIEEE International Conference onMultimedia and Expo, ICME 2007 - Beijing, China
Duration: 2007 Jul 22007 Jul 5

Publication series

NameProceedings of the 2007 IEEE International Conference on Multimedia and Expo, ICME 2007

Other

OtherIEEE International Conference onMultimedia and Expo, ICME 2007
Country/TerritoryChina
CityBeijing
Period2007/07/022007/07/05

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Fingerprint

Dive into the research topics of 'Word topical mixture models for extractive spoken document summarization'. Together they form a unique fingerprint.

Cite this