TY - JOUR
T1 - Enhanced language modeling for extractive speech summarization with sentence relatedness information
AU - Liu, Shih Hung
AU - Chen, Kuan Yu
AU - Hsieh, Yu Lun
AU - Chen, Berlin
AU - Wang, Hsin Min
AU - Yen, Hsu Chun
AU - Hsu, Wen Lian
N1 - Publisher Copyright:
Copyright © 2014 ISCA.
PY - 2014
Y1 - 2014
N2 - Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM-based methods. One is how to construct sentence models involved in the LM framework more accurately without resorting to external information sources. The other is how to additionally take into account the sentence-level structural relationships embedded in a document for important sentence selection. To address these two challenges, in this paper we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to allow for the sentencelevel structural relationships for better summarization performance. Further, the utilities of our proposed methods and several state-of-the-art unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a Mandarin broadcast news summarization task demonstrate the effectiveness and viability of our method.
AB - Extractive summarization is intended to automatically select a set of representative sentences from a text or spoken document that can concisely express the most important topics of the document. Language modeling (LM) has been proven to be a promising framework for performing extractive summarization in an unsupervised manner. However, there remain two fundamental challenges facing existing LM-based methods. One is how to construct sentence models involved in the LM framework more accurately without resorting to external information sources. The other is how to additionally take into account the sentence-level structural relationships embedded in a document for important sentence selection. To address these two challenges, in this paper we explore a novel approach that generates overlapped clusters to extract sentence relatedness information from the document to be summarized, which can be used not only to enhance the estimation of various sentence models but also to allow for the sentencelevel structural relationships for better summarization performance. Further, the utilities of our proposed methods and several state-of-the-art unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a Mandarin broadcast news summarization task demonstrate the effectiveness and viability of our method.
KW - Clustering
KW - Language modeling
KW - Relevance
KW - Sentence relatedness
KW - Speech summarization
UR - http://www.scopus.com/inward/record.url?scp=84910065946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84910065946&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84910065946
SN - 2308-457X
SP - 1865
EP - 1869
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
Y2 - 14 September 2014 through 18 September 2014
ER -