Text summarization using a trainable summarizer and latent semantic analysis

Jen Yuan Yeh*, Hao Ren Ke, Wei Pang Yang, I. Heng Meng

*此作品的通信作者

研究成果: 雜誌貢獻期刊論文同行評審

210 引文 斯高帕斯(Scopus)

摘要

This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA+T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA+T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA+GA, 44% and 40% for LSA+T.R.M. in single-document and corpus level were achieved respectively.

原文英語
頁(從 - 到)75-95
頁數21
期刊Information Processing and Management
41
發行號1
DOIs
出版狀態已發佈 - 2005 1月
對外發佈

ASJC Scopus subject areas

  • 資訊系統
  • 媒體技術
  • 電腦科學應用
  • 管理科學與經營研究
  • 圖書館與資訊科學

指紋

深入研究「Text summarization using a trainable summarizer and latent semantic analysis」主題。共同形成了獨特的指紋。

引用此