Exploring word Mover's distance and semantic-aware embedding techniques for extractive broadcast news summarization

Shih Hung Liu, Kuan Yu Chen, Yu Lun Hsieh, Berlin Chen, Hsin Min Wang, Hsu Chun Yen, Wen Lian Hsu

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

Extractive summarization is a process that manages to select the most salient sentences from a document (or a set of documents) and subsequently assemble them to form an informative summary, facilitating users to browse and assimilate the main theme of the document efficiently. Our work in this paper continues this general line of research and its main contributions are two-fold. First, we explore to leverage the recently proposed word mover's distance (WMD) metric, in conjunction with semantic-aware continuous space representations of words, to authentically capture finer-grained sentence-to-document and/or sentence-to-sentence semantic relatedness for effective use in the summarization process. Second, we investigate to combine our proposed approach with several state-of-the-art summarization methods, which originally adopted the conventional term-overlap or bag-ofwords (BOW) approaches for similarity calculation. A series of experiments conducted on a typical broadcast news summarization task seem to suggest the performance merits of our proposed approach, in comparison to the mainstream methods.

Original languageEnglish
Pages (from-to)670-674
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
Publication statusPublished - 2016 Jan 1
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 2016 Sep 82016 Sep 16

Fingerprint

Summarization
Broadcast
Semantics
Distance Metric
Leverage
Overlap
Fold
Continue
Experiments
Series
News Broadcasts
Line
Term
Experiment

Keywords

  • Extractive summarization
  • Markov random walk
  • Word mover's distance
  • Word representation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Exploring word Mover's distance and semantic-aware embedding techniques for extractive broadcast news summarization. / Liu, Shih Hung; Chen, Kuan Yu; Hsieh, Yu Lun; Chen, Berlin; Wang, Hsin Min; Yen, Hsu Chun; Hsu, Wen Lian.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 08-12-September-2016, 01.01.2016, p. 670-674.

Research output: Contribution to journalConference article

@article{77f667c8e4544e83ad3c1bb0a0cb672d,
title = "Exploring word Mover's distance and semantic-aware embedding techniques for extractive broadcast news summarization",
abstract = "Extractive summarization is a process that manages to select the most salient sentences from a document (or a set of documents) and subsequently assemble them to form an informative summary, facilitating users to browse and assimilate the main theme of the document efficiently. Our work in this paper continues this general line of research and its main contributions are two-fold. First, we explore to leverage the recently proposed word mover's distance (WMD) metric, in conjunction with semantic-aware continuous space representations of words, to authentically capture finer-grained sentence-to-document and/or sentence-to-sentence semantic relatedness for effective use in the summarization process. Second, we investigate to combine our proposed approach with several state-of-the-art summarization methods, which originally adopted the conventional term-overlap or bag-ofwords (BOW) approaches for similarity calculation. A series of experiments conducted on a typical broadcast news summarization task seem to suggest the performance merits of our proposed approach, in comparison to the mainstream methods.",
keywords = "Extractive summarization, Markov random walk, Word mover's distance, Word representation",
author = "Liu, {Shih Hung} and Chen, {Kuan Yu} and Hsieh, {Yu Lun} and Berlin Chen and Wang, {Hsin Min} and Yen, {Hsu Chun} and Hsu, {Wen Lian}",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-710",
language = "English",
volume = "08-12-September-2016",
pages = "670--674",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Exploring word Mover's distance and semantic-aware embedding techniques for extractive broadcast news summarization

AU - Liu, Shih Hung

AU - Chen, Kuan Yu

AU - Hsieh, Yu Lun

AU - Chen, Berlin

AU - Wang, Hsin Min

AU - Yen, Hsu Chun

AU - Hsu, Wen Lian

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Extractive summarization is a process that manages to select the most salient sentences from a document (or a set of documents) and subsequently assemble them to form an informative summary, facilitating users to browse and assimilate the main theme of the document efficiently. Our work in this paper continues this general line of research and its main contributions are two-fold. First, we explore to leverage the recently proposed word mover's distance (WMD) metric, in conjunction with semantic-aware continuous space representations of words, to authentically capture finer-grained sentence-to-document and/or sentence-to-sentence semantic relatedness for effective use in the summarization process. Second, we investigate to combine our proposed approach with several state-of-the-art summarization methods, which originally adopted the conventional term-overlap or bag-ofwords (BOW) approaches for similarity calculation. A series of experiments conducted on a typical broadcast news summarization task seem to suggest the performance merits of our proposed approach, in comparison to the mainstream methods.

AB - Extractive summarization is a process that manages to select the most salient sentences from a document (or a set of documents) and subsequently assemble them to form an informative summary, facilitating users to browse and assimilate the main theme of the document efficiently. Our work in this paper continues this general line of research and its main contributions are two-fold. First, we explore to leverage the recently proposed word mover's distance (WMD) metric, in conjunction with semantic-aware continuous space representations of words, to authentically capture finer-grained sentence-to-document and/or sentence-to-sentence semantic relatedness for effective use in the summarization process. Second, we investigate to combine our proposed approach with several state-of-the-art summarization methods, which originally adopted the conventional term-overlap or bag-ofwords (BOW) approaches for similarity calculation. A series of experiments conducted on a typical broadcast news summarization task seem to suggest the performance merits of our proposed approach, in comparison to the mainstream methods.

KW - Extractive summarization

KW - Markov random walk

KW - Word mover's distance

KW - Word representation

UR - http://www.scopus.com/inward/record.url?scp=84994242295&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994242295&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-710

DO - 10.21437/Interspeech.2016-710

M3 - Conference article

VL - 08-12-September-2016

SP - 670

EP - 674

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -