Chinese text summarization using a trainable summarizer and latent semantic analysis

Jen Yuan Yeh, Hao Ren Ke, Wei Pang Yang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages76-87
Number of pages12
ISBN (Print)3540002618, 9783540002611
Publication statusPublished - 2002 Jan 1
Event5th International Conference on Asian Digital Libraries, ICADL 2002 - Singapore, Singapore
Duration: 2002 Dec 112002 Dec 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2555
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other5th International Conference on Asian Digital Libraries, ICADL 2002
CountrySingapore
CitySingapore
Period02/12/1102/12/14

Fingerprint

Latent Semantic Analysis
Summarization
Genetic algorithms
Semantics
Score Function
Taiwan
High Accuracy
Compression
Genetic Algorithm
Unit
Text
Corpus
Relationships

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Yeh, J. Y., Ke, H. R., & Yang, W. P. (2002). Chinese text summarization using a trainable summarizer and latent semantic analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 76-87). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2555). Springer Verlag.

Chinese text summarization using a trainable summarizer and latent semantic analysis. / Yeh, Jen Yuan; Ke, Hao Ren; Yang, Wei Pang.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, 2002. p. 76-87 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2555).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yeh, JY, Ke, HR & Yang, WP 2002, Chinese text summarization using a trainable summarizer and latent semantic analysis. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2555, Springer Verlag, pp. 76-87, 5th International Conference on Asian Digital Libraries, ICADL 2002, Singapore, Singapore, 02/12/11.
Yeh JY, Ke HR, Yang WP. Chinese text summarization using a trainable summarizer and latent semantic analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag. 2002. p. 76-87. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Yeh, Jen Yuan ; Ke, Hao Ren ; Yang, Wei Pang. / Chinese text summarization using a trainable summarizer and latent semantic analysis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer Verlag, 2002. pp. 76-87 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{2225b353d9a441b1b822819dd70b16b1,
title = "Chinese text summarization using a trainable summarizer and latent semantic analysis",
abstract = "In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30{\%}, average recalls of 52.0{\%} and 45.6{\%} were achieved respectively.",
author = "Yeh, {Jen Yuan} and Ke, {Hao Ren} and Yang, {Wei Pang}",
year = "2002",
month = "1",
day = "1",
language = "English",
isbn = "3540002618",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "76--87",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Chinese text summarization using a trainable summarizer and latent semantic analysis

AU - Yeh, Jen Yuan

AU - Ke, Hao Ren

AU - Yang, Wei Pang

PY - 2002/1/1

Y1 - 2002/1/1

N2 - In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.

AB - In this paper, two novel approaches are proposed to extract important sentences from a document to create its summary. The first is a corpus-based approach using feature analysis. It brings up three new ideas: 1) to employ ranked position to emphasize the significance of sentence position, 2) to reshape word unit to achieve higher accuracy of keyword importance, and 3) to train a score function by the genetic algorithm for obtaining a suitable combination of feature weights. The second approach combines the ideas of latent semantic analysis and text relationship maps to interpret conceptual structures of a document. Both approaches are applied to Chinese text summarization. The two approaches were evaluated by using a data corpus composed of 100 articles about politics from New Taiwan Weekly, and when the compression ratio was 30%, average recalls of 52.0% and 45.6% were achieved respectively.

UR - http://www.scopus.com/inward/record.url?scp=84949183406&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949183406&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540002618

SN - 9783540002611

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 76

EP - 87

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -