TY - JOUR
T1 - Leveraging word embeddings for spoken document summarization
AU - Chen, Kuan Yu
AU - Liu, Shih Hung
AU - Wang, Hsin Min
AU - Chen, Berlin
AU - Chen, Hsin Hsi
N1 - Publisher Copyright:
Copyright © 2015 ISCA.
PY - 2015
Y1 - 2015
N2 - Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.
AB - Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly favorite research subject because of its excellent performance in many natural language processing (NLP)-related tasks. However, as far as we are aware, there are relatively few studies investigating its use in extractive text or speech summarization. A common thread of leveraging word embeddings in the summarization process is to represent the document (or sentence) by averaging the word embeddings of the words occurring in the document (or sentence). Then, intuitively, the cosine similarity measure can be employed to determine the relevance degree between a pair of representations. Beyond the continued efforts made to improve the representation of words, this paper focuses on building novel and efficient ranking models based on the general word embedding methods for extractive speech summarization. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.
KW - Ranking model
KW - Spoken document
KW - Summarization
KW - Word embedding
UR - http://www.scopus.com/inward/record.url?scp=84959161508&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959161508&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84959161508
SN - 2308-457X
VL - 2015-January
SP - 1383
EP - 1387
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
Y2 - 6 September 2015 through 10 September 2015
ER -