Learning to distill: The essence vector modeling framework

Kuan Yu Chen, Shih Hung Liu, Berlin Chen, Hsin Min Wang

研究成果: 書貢獻/報告類型會議論文篇章

6 引文 斯高帕斯(Scopus)

摘要

In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classification and document summarization. Nevertheless, as far as we are aware, there is relatively less work focusing on the development of unsupervised paragraph embedding methods. Classic paragraph embedding methods infer the representation of a given paragraph by considering all of the words occurring in the paragraph. Consequently, those stop or function words that occur frequently may mislead the embedding learning process to produce a misty paragraph representation. Motivated by these observations, our major contributions in this paper are twofold. First, we propose a novel unsupervised paragraph embedding method, named the essence vector (EV) model, which aims at not only distilling the most representative information from a paragraph but also excluding the general background information to produce a more informative low-dimensional vector representation for the paragraph. We evaluate the proposed EV model on benchmark sentiment classification and multi-document summarization tasks. The experimental results demonstrate the effectiveness and applicability of the proposed embedding method. Second, in view of the increasing importance of spoken content processing, an extension of the EV model, named the denoising essence vector (D-EV) model, is proposed. The D-EV model not only inherits the advantages of the EV model but also can infer a more robust representation for a given spoken paragraph against imperfect speech recognition. The utility of the D-EV model is evaluated on a spoken document summarization task, confirming the practical merits of the proposed embedding method in relation to several well-practiced and state-of-the-art summarization methods.

原文英語
主出版物標題COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016
主出版物子標題Technical Papers
發行者Association for Computational Linguistics, ACL Anthology
頁面358-368
頁數11
ISBN(列印)9784879747020
出版狀態已發佈 - 2016
事件26th International Conference on Computational Linguistics, COLING 2016 - Osaka, 日本
持續時間: 2016 12月 112016 12月 16

出版系列

名字COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers

會議

會議26th International Conference on Computational Linguistics, COLING 2016
國家/地區日本
城市Osaka
期間2016/12/112016/12/16

ASJC Scopus subject areas

  • 計算機理論與數學
  • 語言與語言學
  • 語言和語言學

指紋

深入研究「Learning to distill: The essence vector modeling framework」主題。共同形成了獨特的指紋。

引用此