TY - JOUR
T1 - Generic title labeling for clustered documents
AU - Tseng, Yuen Hsien
N1 - Funding Information:
This work is supported in part by National Science Council under the grants numbered: NSC 95-2221-E-003-016- and NSC 97-2631-S-003-003-.
PY - 2010/3/15
Y1 - 2010/3/15
N2 - Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.
AB - Document clustering is a powerful technique to detect topics and their relations for information browsing, analysis, and organization. However, clustered documents require post-assignment of descriptive titles to help users interpret the results. Existing techniques often assign labels to clusters based only on the terms that the clustered documents contain, which may not be sufficient for some applications. To solve this problem, a cluster labeling algorithm for creating generic titles, based on external resources such as WordNet, is proposed. Our method first extracts category-specific terms as cluster descriptors. These descriptors are then mapped to generic terms based on a hypernym search algorithm. The proposed method has been evaluated on a patent document collection and a subset of the Reuters-21578 collection. Experimental results revealed that our method performs as anticipated. Real-case applications of these generic terms show promising in assisting humans in interpreting the clustered topics. Our method is general enough such that it can be easily extended to use other hierarchical resources for adaptable label generation.
KW - Clustering labeling
KW - Correlation coefficient
KW - Hypernym search
KW - Topic identification
KW - WordNet
UR - http://www.scopus.com/inward/record.url?scp=70449529515&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449529515&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2009.07.048
DO - 10.1016/j.eswa.2009.07.048
M3 - Article
AN - SCOPUS:70449529515
SN - 0957-4174
VL - 37
SP - 2247
EP - 2254
JO - Expert Systems with Applications
JF - Expert Systems with Applications
IS - 3
ER -