TY - JOUR
T1 - Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques
AU - Chen, Kuan Yu
AU - Liu, Shih Hung
AU - Chen, Berlin
AU - Wang, Hsin Min
AU - Jan, Ea Ee
AU - Hsu, Wen Lian
AU - Chen, Hsin Hsi
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/1
Y1 - 2015/8/1
N2 - Extractive text or speech summarization manages to select a set of salient sentences from an original document and concatenate them to form a summary, enabling users to better browse through and understand the content of the document. A recent stream of research on extractive summarization is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each sentence in the document to be summarized. In view of this, our work in this paper explores a novel use of recurrent neural network language modeling (RNNLM) framework for extractive broadcast news summarization. On top of such a framework, the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within broadcast news documents, getting around the need for the strict bag-of-words assumption. Furthermore, different model complexities and combinations are extensively analyzed and compared. Experimental results demonstrate the performance merits of our summarization methods when compared to several well-studied state-of-the-art unsupervised methods.
AB - Extractive text or speech summarization manages to select a set of salient sentences from an original document and concatenate them to form a summary, enabling users to better browse through and understand the content of the document. A recent stream of research on extractive summarization is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each sentence in the document to be summarized. In view of this, our work in this paper explores a novel use of recurrent neural network language modeling (RNNLM) framework for extractive broadcast news summarization. On top of such a framework, the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within broadcast news documents, getting around the need for the strict bag-of-words assumption. Furthermore, different model complexities and combinations are extensively analyzed and compared. Experimental results demonstrate the performance merits of our summarization methods when compared to several well-studied state-of-the-art unsupervised methods.
KW - Language modeling
KW - long-span structural information
KW - recurrent neural network
KW - speech summarization
UR - http://www.scopus.com/inward/record.url?scp=84930943980&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930943980&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2015.2432578
DO - 10.1109/TASLP.2015.2432578
M3 - Article
AN - SCOPUS:84930943980
SN - 1558-7916
VL - 23
SP - 1322
EP - 1334
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 8
M1 - 7111264
ER -