A probabilistic generative framework for extractive broadcast news speech summarization

Yi Ting Chen, Berlin Chen, Hsin Min Wang

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.

Original languageEnglish
Article number4717223
Pages (from-to)95-106
Number of pages12
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume17
Issue number1
DOIs
Publication statusPublished - 2009 Jan 1

Fingerprint

sentences
news
ranking
Taiwan
Experiments

Keywords

  • Extractive spoken document summarization
  • Language model (LM)
  • Probabilistic generative framework
  • Relevance model (RM)
  • Topical mixture model

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Cite this

A probabilistic generative framework for extractive broadcast news speech summarization. / Chen, Yi Ting; Chen, Berlin; Wang, Hsin Min.

In: IEEE Transactions on Audio, Speech and Language Processing, Vol. 17, No. 1, 4717223, 01.01.2009, p. 95-106.

Research output: Contribution to journalArticle

@article{61cae46d8da5437c96f102bf34c832b2,
title = "A probabilistic generative framework for extractive broadcast news speech summarization",
abstract = "In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.",
keywords = "Extractive spoken document summarization, Language model (LM), Probabilistic generative framework, Relevance model (RM), Topical mixture model",
author = "Chen, {Yi Ting} and Berlin Chen and Wang, {Hsin Min}",
year = "2009",
month = "1",
day = "1",
doi = "10.1109/TASL.2008.2005031",
language = "English",
volume = "17",
pages = "95--106",
journal = "IEEE Transactions on Audio, Speech and Language Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - A probabilistic generative framework for extractive broadcast news speech summarization

AU - Chen, Yi Ting

AU - Chen, Berlin

AU - Wang, Hsin Min

PY - 2009/1/1

Y1 - 2009/1/1

N2 - In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.

AB - In this paper, we consider extractive summarization of broadcast news speech and propose a unified probabilistic generative framework that combines the sentence generative probability and the sentence prior probability for sentence ranking. Each sentence of a spoken document to be summarized is treated as a probabilistic generative model for predicting the document. Two matching strategies, namely literal term matching and concept matching, are thoroughly investigated. We explore the use of the language model (LM) and the relevance model (RM) for literal term matching, while the sentence topical mixture model (STMM) and the word topical mixture model (WTMM) are used for concept matching. In addition, the lexical and prosodic features, as well as the relevance information of spoken sentences, are properly incorporated for the estimation of the sentence prior probability. An elegant feature of our proposed framework is that both the sentence generative probability and the sentence prior probability can be estimated in an unsupervised manner, without the need for handcrafted document-summary pairs. The experiments were performed on Chinese broadcast news collected in Taiwan, and very encouraging results were obtained.

KW - Extractive spoken document summarization

KW - Language model (LM)

KW - Probabilistic generative framework

KW - Relevance model (RM)

KW - Topical mixture model

UR - http://www.scopus.com/inward/record.url?scp=67149133555&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67149133555&partnerID=8YFLogxK

U2 - 10.1109/TASL.2008.2005031

DO - 10.1109/TASL.2008.2005031

M3 - Article

AN - SCOPUS:67149133555

VL - 17

SP - 95

EP - 106

JO - IEEE Transactions on Audio, Speech and Language Processing

JF - IEEE Transactions on Audio, Speech and Language Processing

SN - 1558-7916

IS - 1

M1 - 4717223

ER -