A novel paragraph embedding method for spoken document summarization

Kuan Yu Chen, Shih Hung Liu, Berlin Chen, Hsin Min Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. In the context of natural language processing, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as information retrieval and document summarization. However, as far as we are aware, there is only a dearth of research focusing on launching paragraph embedding methods. Extractive spoken document summarization, which can help us browse and digest multimedia data efficiently, aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. A general consensus is that relevance and redundancy are both critical issues in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between a pair of sentence and document. Motivated by these observations, three major contributions are proposed in this paper. First, we propose a novel unsupervised paragraph embedding method, named the essence vector model, which aims at not only distilling the most representative information from a paragraph but also getting rid of the general background information to produce a more informative low-dimensional vector representation. Second, we incorporate the deduced essence vectors with a density peaks clustering summarization method, which can take both relevance and redundancy information into account simultaneously, to enhance the spoken document summarization performance. Third, the effectiveness of our proposed methods over several well-practiced and state-of-the-art methods is confirmed by extensive spoken document summarization experiments.

Original languageEnglish
Title of host publication2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9789881476821
DOIs
Publication statusPublished - 2017 Jan 17
Event2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, Korea, Republic of
Duration: 2016 Dec 132016 Dec 16

Publication series

Name2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

Other

Other2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
CountryKorea, Republic of
CityJeju
Period16/12/1316/12/16

Fingerprint

Redundancy
Launching
Information retrieval
Learning systems
Processing
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Signal Processing

Cite this

Chen, K. Y., Liu, S. H., Chen, B., & Wang, H. M. (2017). A novel paragraph embedding method for spoken document summarization. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 [7820882] (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2016.7820882

A novel paragraph embedding method for spoken document summarization. / Chen, Kuan Yu; Liu, Shih Hung; Chen, Berlin; Wang, Hsin Min.

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. 7820882 (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, KY, Liu, SH, Chen, B & Wang, HM 2017, A novel paragraph embedding method for spoken document summarization. in 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016., 7820882, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Institute of Electrical and Electronics Engineers Inc., 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Jeju, Korea, Republic of, 16/12/13. https://doi.org/10.1109/APSIPA.2016.7820882
Chen KY, Liu SH, Chen B, Wang HM. A novel paragraph embedding method for spoken document summarization. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7820882. (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016). https://doi.org/10.1109/APSIPA.2016.7820882
Chen, Kuan Yu ; Liu, Shih Hung ; Chen, Berlin ; Wang, Hsin Min. / A novel paragraph embedding method for spoken document summarization. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. (2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016).
@inproceedings{2ba608be621f4d7a8f1c331725d2b8e4,
title = "A novel paragraph embedding method for spoken document summarization",
abstract = "Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. In the context of natural language processing, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as information retrieval and document summarization. However, as far as we are aware, there is only a dearth of research focusing on launching paragraph embedding methods. Extractive spoken document summarization, which can help us browse and digest multimedia data efficiently, aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. A general consensus is that relevance and redundancy are both critical issues in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between a pair of sentence and document. Motivated by these observations, three major contributions are proposed in this paper. First, we propose a novel unsupervised paragraph embedding method, named the essence vector model, which aims at not only distilling the most representative information from a paragraph but also getting rid of the general background information to produce a more informative low-dimensional vector representation. Second, we incorporate the deduced essence vectors with a density peaks clustering summarization method, which can take both relevance and redundancy information into account simultaneously, to enhance the spoken document summarization performance. Third, the effectiveness of our proposed methods over several well-practiced and state-of-the-art methods is confirmed by extensive spoken document summarization experiments.",
author = "Chen, {Kuan Yu} and Liu, {Shih Hung} and Berlin Chen and Wang, {Hsin Min}",
year = "2017",
month = "1",
day = "17",
doi = "10.1109/APSIPA.2016.7820882",
language = "English",
series = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
booktitle = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",

}

TY - GEN

T1 - A novel paragraph embedding method for spoken document summarization

AU - Chen, Kuan Yu

AU - Liu, Shih Hung

AU - Chen, Berlin

AU - Wang, Hsin Min

PY - 2017/1/17

Y1 - 2017/1/17

N2 - Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. In the context of natural language processing, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as information retrieval and document summarization. However, as far as we are aware, there is only a dearth of research focusing on launching paragraph embedding methods. Extractive spoken document summarization, which can help us browse and digest multimedia data efficiently, aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. A general consensus is that relevance and redundancy are both critical issues in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between a pair of sentence and document. Motivated by these observations, three major contributions are proposed in this paper. First, we propose a novel unsupervised paragraph embedding method, named the essence vector model, which aims at not only distilling the most representative information from a paragraph but also getting rid of the general background information to produce a more informative low-dimensional vector representation. Second, we incorporate the deduced essence vectors with a density peaks clustering summarization method, which can take both relevance and redundancy information into account simultaneously, to enhance the spoken document summarization performance. Third, the effectiveness of our proposed methods over several well-practiced and state-of-the-art methods is confirmed by extensive spoken document summarization experiments.

AB - Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. In the context of natural language processing, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as information retrieval and document summarization. However, as far as we are aware, there is only a dearth of research focusing on launching paragraph embedding methods. Extractive spoken document summarization, which can help us browse and digest multimedia data efficiently, aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. A general consensus is that relevance and redundancy are both critical issues in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between a pair of sentence and document. Motivated by these observations, three major contributions are proposed in this paper. First, we propose a novel unsupervised paragraph embedding method, named the essence vector model, which aims at not only distilling the most representative information from a paragraph but also getting rid of the general background information to produce a more informative low-dimensional vector representation. Second, we incorporate the deduced essence vectors with a density peaks clustering summarization method, which can take both relevance and redundancy information into account simultaneously, to enhance the spoken document summarization performance. Third, the effectiveness of our proposed methods over several well-practiced and state-of-the-art methods is confirmed by extensive spoken document summarization experiments.

UR - http://www.scopus.com/inward/record.url?scp=85013862334&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013862334&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2016.7820882

DO - 10.1109/APSIPA.2016.7820882

M3 - Conference contribution

T3 - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -