Generic title labeling for clustered documents

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

Original languageEnglish
Pages (from-to)2247-2254
Number of pages8
JournalExpert Systems with Applications
Volume37
Issue number3
DOIs
Publication statusPublished - 2010 Mar 15

Fingerprint

Labeling
Labels
Information analysis

Keywords

  • Clustering labeling
  • Correlation coefficient
  • Hypernym search
  • Topic identification
  • WordNet

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Generic title labeling for clustered documents. / Tseng, Yuen-Hsien.

In: Expert Systems with Applications, Vol. 37, No. 3, 15.03.2010, p. 2247-2254.

Research output: Contribution to journalArticle

@article{ff655a979ba145caaa8b858b7fd8b861,
title = "Generic title labeling for clustered documents",
abstract = "Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.",
keywords = "Clustering labeling, Correlation coefficient, Hypernym search, Topic identification, WordNet",
author = "Yuen-Hsien Tseng",
year = "2010",
month = "3",
day = "15",
doi = "10.1016/j.eswa.2009.07.048",
language = "English",
volume = "37",
pages = "2247--2254",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "3",

}

TY - JOUR

T1 - Generic title labeling for clustered documents

AU - Tseng, Yuen-Hsien

PY - 2010/3/15

Y1 - 2010/3/15

N2 - Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

AB - Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.

KW - Clustering labeling

KW - Correlation coefficient

KW - Hypernym search

KW - Topic identification

KW - WordNet

UR - http://www.scopus.com/inward/record.url?scp=70449529515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70449529515&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2009.07.048

DO - 10.1016/j.eswa.2009.07.048

M3 - Article

AN - SCOPUS:70449529515

VL - 37

SP - 2247

EP - 2254

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 3

ER -