Enhanced language modeling for extractive speech summarization with sentence relatedness information

Shih Hung Liu, Kuan Yu Chen, Yu Lun Hsieh, Berlin Chen, Hsin Min Wang, Hsu Chun Yen, Wen Lian Hsu

Research output: Contribution to journalConference article

Abstract

Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM-based methods. One is how to construct sentence models involved in the LM framework more accurately without resorting to external information sources. The other is how to additionally take into account the sentence-level structural relationships embedded in a document for important sentence selection. To address these two challenges, in this paper we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to allow for the sentencelevel structural relationships for better summarization performance. Further, the utilities of our proposed methods and several state-of-the-art unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a Mandarin broadcast news summarization task demonstrate the effectiveness and viability of our method.

Original languageEnglish
Pages (from-to)1865-1869
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2014 Jan 1
Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
Duration: 2014 Sep 142014 Sep 18

Fingerprint

Language Modeling
Summarization
Viability
Broadcast
Express
Experiments
Series
Speech
Speech Summarization
Model
Demonstrate
Experiment
Relationships
Framework

Keywords

  • Clustering
  • Language modeling
  • Relevance
  • Sentence relatedness
  • Speech summarization

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Enhanced language modeling for extractive speech summarization with sentence relatedness information. / Liu, Shih Hung; Chen, Kuan Yu; Hsieh, Yu Lun; Chen, Berlin; Wang, Hsin Min; Yen, Hsu Chun; Hsu, Wen Lian.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 01.01.2014, p. 1865-1869.

Research output: Contribution to journalConference article

@article{4344d62fe4de4b748d2c9e5709839ea9,
title = "Enhanced language modeling for extractive speech summarization with sentence relatedness information",
abstract = "Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM-based methods. One is how to construct sentence models involved in the LM framework more accurately without resorting to external information sources. The other is how to additionally take into account the sentence-level structural relationships embedded in a document for important sentence selection. To address these two challenges, in this paper we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to allow for the sentencelevel structural relationships for better summarization performance. Further, the utilities of our proposed methods and several state-of-the-art unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a Mandarin broadcast news summarization task demonstrate the effectiveness and viability of our method.",
keywords = "Clustering, Language modeling, Relevance, Sentence relatedness, Speech summarization",
author = "Liu, {Shih Hung} and Chen, {Kuan Yu} and Hsieh, {Yu Lun} and Berlin Chen and Wang, {Hsin Min} and Yen, {Hsu Chun} and Hsu, {Wen Lian}",
year = "2014",
month = "1",
day = "1",
language = "English",
pages = "1865--1869",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Enhanced language modeling for extractive speech summarization with sentence relatedness information

AU - Liu, Shih Hung

AU - Chen, Kuan Yu

AU - Hsieh, Yu Lun

AU - Chen, Berlin

AU - Wang, Hsin Min

AU - Yen, Hsu Chun

AU - Hsu, Wen Lian

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM-based methods. One is how to construct sentence models involved in the LM framework more accurately without resorting to external information sources. The other is how to additionally take into account the sentence-level structural relationships embedded in a document for important sentence selection. To address these two challenges, in this paper we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to allow for the sentencelevel structural relationships for better summarization performance. Further, the utilities of our proposed methods and several state-of-the-art unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a Mandarin broadcast news summarization task demonstrate the effectiveness and viability of our method.

AB - Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM-based methods. One is how to construct sentence models involved in the LM framework more accurately without resorting to external information sources. The other is how to additionally take into account the sentence-level structural relationships embedded in a document for important sentence selection. To address these two challenges, in this paper we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to allow for the sentencelevel structural relationships for better summarization performance. Further, the utilities of our proposed methods and several state-of-the-art unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a Mandarin broadcast news summarization task demonstrate the effectiveness and viability of our method.

KW - Clustering

KW - Language modeling

KW - Relevance

KW - Sentence relatedness

KW - Speech summarization

UR - http://www.scopus.com/inward/record.url?scp=84910065946&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910065946&partnerID=8YFLogxK

M3 - Conference article

SP - 1865

EP - 1869

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -