Automatic thesaurus generation for Chinese documents

Yuen Hsien Tseng*

*此作品的通信作者

研究成果: 雜誌貢獻期刊論文同行評審

48 引文 斯高帕斯(Scopus)

摘要

This article reports an approach to automatic thesaurus construction for Chinese documents. An effective Chinese keyword extraction algorithm is first presented. Experiments showed that for each document an average of 33% keywords unknown to a lexicon of 123,226 terms could be identified by this algorithm. Of these unregistered words, only 8.3% of them are illegal. Keywords extracted from each document are further filtered for term association analysis. Association weights larger than a threshold are then accumulated over all the documents to yield the final term pair similarities. Compared to previous studies, this method speeds up the thesaurus generation process drastically. It also achieves a similar percentage level of term relatedness.

原文英語
頁(從 - 到)1130-1138
頁數9
期刊Journal of the American Society for Information Science and Technology
53
發行號13
DOIs
出版狀態已發佈 - 2002 十一月 1

ASJC Scopus subject areas

  • 軟體
  • 資訊系統
  • 人機介面
  • 電腦網路與通信
  • 人工智慧

指紋

深入研究「Automatic thesaurus generation for Chinese documents」主題。共同形成了獨特的指紋。

引用此