Chinese text summarization using a trainable summarizer and latent semantic analysis

Jen Yuan Yeh, Hao Ren Ke, Wei Pang Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

24 Citations (Scopus)

Abstract

In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsEe-Peng Lim, Schubert Foo, Chris Khoo, Hsinchun Chen, Edward Fox, Shalini Urs, Thanos Costantino
PublisherSpringer Verlag
Pages76-87
Number of pages12
ISBN (Print)3540002618, 9783540002611
DOIs
Publication statusPublished - 2002
Externally publishedYes
Event5th International Conference on Asian Digital Libraries, ICADL 2002 - Singapore, Singapore
Duration: 2002 Dec 112002 Dec 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2555
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Conference on Asian Digital Libraries, ICADL 2002
Country/TerritorySingapore
CitySingapore
Period2002/12/112002/12/14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Chinese text summarization using a trainable summarizer and latent semantic analysis'. Together they form a unique fingerprint.

Cite this