TY - GEN
T1 - Incorporating paragraph embeddings and density peaks clustering for spoken document summarization
AU - Chen, Kuan Yu
AU - Shih, Kai Wun
AU - Liu, Shih Hung
AU - Chen, Berlin
AU - Wang, Hsin Min
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/10
Y1 - 2016/2/10
N2 - Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.
AB - Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.
KW - Spoken document
KW - embedding
KW - redundancy
KW - relevance
KW - summarization
UR - http://www.scopus.com/inward/record.url?scp=84964483217&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964483217&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2015.7404796
DO - 10.1109/ASRU.2015.7404796
M3 - Conference contribution
AN - SCOPUS:84964483217
T3 - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
SP - 207
EP - 214
BT - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
Y2 - 13 December 2015 through 17 December 2015
ER -