A novel paragraph embedding method for spoken document summarization

Kuan Yu Chen, Shih Hung Liu, Berlin Chen, Hsin Min Wang

研究成果: 書貢獻/報告類型會議論文篇章

摘要

Representation learning has emerged as a newly active research subject in many machine learning applications because of its excellent performance. In the context of natural language processing, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as information retrieval and document summarization. However, as far as we are aware, there is only a dearth of research focusing on launching paragraph embedding methods. Extractive spoken document summarization, which can help us browse and digest multimedia data efficiently, aims at selecting a set of indicative sentences from a source document to express the most important theme of the document. A general consensus is that relevance and redundancy are both critical issues in a realistic summarization scenario. However, most of the existing methods focus on determining only the relevance degree between a pair of sentence and document. Motivated by these observations, three major contributions are proposed in this paper. First, we propose a novel unsupervised paragraph embedding method, named the essence vector model, which aims at not only distilling the most representative information from a paragraph but also getting rid of the general background information to produce a more informative low-dimensional vector representation. Second, we incorporate the deduced essence vectors with a density peaks clustering summarization method, which can take both relevance and redundancy information into account simultaneously, to enhance the spoken document summarization performance. Third, the effectiveness of our proposed methods over several well-practiced and state-of-the-art methods is confirmed by extensive spoken document summarization experiments.

原文英語
主出版物標題2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
發行者Institute of Electrical and Electronics Engineers Inc.
ISBN(電子)9789881476821
DOIs
出版狀態已發佈 - 2017 一月 17
事件2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, 大韓民國
持續時間: 2016 十二月 132016 十二月 16

出版系列

名字2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

其他

其他2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
國家/地區大韓民國
城市Jeju
期間2016/12/132016/12/16

ASJC Scopus subject areas

  • 人工智慧
  • 電腦科學應用
  • 資訊系統
  • 訊號處理

指紋

深入研究「A novel paragraph embedding method for spoken document summarization」主題。共同形成了獨特的指紋。

引用此