Verifying a Chinese collection for text categorization

Yuen Hsien Tseng*, William John Teahan

*此作品的通信作者

研究成果: 書貢獻/報告類型會議論文篇章

3 引文 斯高帕斯(Scopus)

摘要

This article describes the development of a free test collection for Chinese text categorization. A novel retrieval-based approach was developed to detect duplicates and label inconsistency in this corpus and in Reuters-21578 for comparison. The method was able to detect certain types of similar and/or duplicated documents that were overlooked by an alternative repetition-based method. Experiments showed that effectiveness was not affected by the confusing documents.

原文英語
主出版物標題Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
發行者Association for Computing Machinery (ACM)
頁面556-557
頁數2
ISBN(列印)1581138814, 9781581138818
DOIs
出版狀態已發佈 - 2004
對外發佈
事件Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Sheffield, 英国
持續時間: 2004 7月 252004 7月 29

出版系列

名字Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

其他

其他Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
國家/地區英国
城市Sheffield
期間2004/07/252004/07/29

ASJC Scopus subject areas

  • 一般工程

指紋

深入研究「Verifying a Chinese collection for text categorization」主題。共同形成了獨特的指紋。

引用此