TY - JOUR
T1 - A recurrent neural network language modeling framework for extractive speech summarization
AU - Chen, Kuan Yu
AU - Liu, Shih Hung
AU - Chen, Berlin
AU - Wang, Hsin Min
AU - Hsu, Wen Lion
AU - Chen, Hsin Hsi
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/3
Y1 - 2014/9/3
N2 - Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.
AB - Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.
KW - language modeling
KW - long-span structural information
KW - recurrent neural network
KW - speech summarization
UR - http://www.scopus.com/inward/record.url?scp=84930933016&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84930933016&partnerID=8YFLogxK
U2 - 10.1109/ICME.2014.6890220
DO - 10.1109/ICME.2014.6890220
M3 - Conference article
AN - SCOPUS:84930933016
SN - 1945-7871
VL - 2014-September
JO - Proceedings - IEEE International Conference on Multimedia and Expo
JF - Proceedings - IEEE International Conference on Multimedia and Expo
IS - Septmber
M1 - 6890220
T2 - 2014 IEEE International Conference on Multimedia and Expo, ICME 2014
Y2 - 14 July 2014 through 18 July 2014
ER -