Fast co-occurrence thesaurus construction for Chinese news

Yuen Hsien Tseng*

*此作品的通信作者

研究成果: 雜誌貢獻會議論文同行評審

3 引文 斯高帕斯(Scopus)

摘要

This paper reports an approach to automatic thesaurus construction for Chinese news articles. An effective Chinese word segmentation and keyword extraction algorithm is first presented. For each document, an average of 33% keywords unknown to a lexicon of 123,226 terms can be identified. The extraction error rate is 3.6%. Keywords extracted from each document are then further filtered for term association analysis by a modified Dice coefficient formula. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method not only speeds up the thesaurus generation process drastically, but also achieves a similar percentage level of term relatedness.

原文英語
頁(從 - 到)853-858
頁數6
期刊Proceedings of the IEEE International Conference on Systems, Man and Cybernetics
2
出版狀態已發佈 - 2001
對外發佈
事件2001 IEEE International Conference on Systems, Man and Cybernetics - Tucson, AZ, 美国
持續時間: 2001 10月 72001 10月 10

ASJC Scopus subject areas

  • 控制與系統工程
  • 硬體和架構

指紋

深入研究「Fast co-occurrence thesaurus construction for Chinese news」主題。共同形成了獨特的指紋。

引用此