Incorporating paragraph embeddings and density peaks clustering for spoken document summarization

Kuan Yu Chen, Kai Wun Shih, Shih Hung Liu, Berlin Chen, Hsin Min Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

Original languageEnglish
Title of host publication2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages207-214
Number of pages8
ISBN (Electronic)9781479972913
DOIs
Publication statusPublished - 2016 Feb 10
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
Duration: 2015 Dec 132015 Dec 17

Publication series

Name2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
CountryUnited States
CityScottsdale
Period15/12/1315/12/17

Fingerprint

Redundancy
Processing
Learning systems

Keywords

  • Spoken document
  • embedding
  • redundancy
  • relevance
  • summarization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Cite this

Chen, K. Y., Shih, K. W., Liu, S. H., Chen, B., & Wang, H. M. (2016). Incorporating paragraph embeddings and density peaks clustering for spoken document summarization. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings (pp. 207-214). [7404796] (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2015.7404796

Incorporating paragraph embeddings and density peaks clustering for spoken document summarization. / Chen, Kuan Yu; Shih, Kai Wun; Liu, Shih Hung; Chen, Berlin; Wang, Hsin Min.

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 207-214 7404796 (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, KY, Shih, KW, Liu, SH, Chen, B & Wang, HM 2016, Incorporating paragraph embeddings and density peaks clustering for spoken document summarization. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings., 7404796, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 207-214, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, United States, 15/12/13. https://doi.org/10.1109/ASRU.2015.7404796
Chen KY, Shih KW, Liu SH, Chen B, Wang HM. Incorporating paragraph embeddings and density peaks clustering for spoken document summarization. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 207-214. 7404796. (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings). https://doi.org/10.1109/ASRU.2015.7404796
Chen, Kuan Yu ; Shih, Kai Wun ; Liu, Shih Hung ; Chen, Berlin ; Wang, Hsin Min. / Incorporating paragraph embeddings and density peaks clustering for spoken document summarization. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 207-214 (2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings).
@inproceedings{7c181ae79dd046c0ace335f22e52fe79,
title = "Incorporating paragraph embeddings and density peaks clustering for spoken document summarization",
abstract = "Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.",
keywords = "Spoken document, embedding, redundancy, relevance, summarization",
author = "Chen, {Kuan Yu} and Shih, {Kai Wun} and Liu, {Shih Hung} and Berlin Chen and Wang, {Hsin Min}",
year = "2016",
month = "2",
day = "10",
doi = "10.1109/ASRU.2015.7404796",
language = "English",
series = "2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "207--214",
booktitle = "2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings",

}

TY - GEN

T1 - Incorporating paragraph embeddings and density peaks clustering for spoken document summarization

AU - Chen, Kuan Yu

AU - Shih, Kai Wun

AU - Liu, Shih Hung

AU - Chen, Berlin

AU - Wang, Hsin Min

PY - 2016/2/10

Y1 - 2016/2/10

N2 - Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

AB - Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. As an instantiation, word embedding has been widely used in the natural language processing area. However, as far as we are aware, there are relatively few studies investigating paragraph embedding methods in extractive text or speech summarization. Extractive summarization aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. There is a general consensus that relevance and redundancy are both critical issues for users in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between sentences and a given document, while the redundancy degree is calculated by a post-processing step. Based on these observations, three contributions are proposed in this paper. First, we comprehensively compare the word and paragraph embedding methods for spoken document summarization. Next, we propose a novel summarization framework which can take both relevance and redundancy information into account simultaneously. Consequently, a set of representative sentences can be automatically selected through a one-pass process. Third, we further plug in paragraph embedding methods into the proposed framework to enhance the summarization performance. Experimental results demonstrate the effectiveness of our proposed methods, compared to existing state-of-the-art methods.

KW - Spoken document

KW - embedding

KW - redundancy

KW - relevance

KW - summarization

UR - http://www.scopus.com/inward/record.url?scp=84964483217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964483217&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2015.7404796

DO - 10.1109/ASRU.2015.7404796

M3 - Conference contribution

T3 - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

SP - 207

EP - 214

BT - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -