Verifying a Chinese collection for text categorization

Yuen Hsien Tseng, William John Teahan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This article describes the development of a free test collection for Chinese text categorization. A novel retrieval-based approach was developed to detect duplicates and label inconsistency in this corpus and in Reuters-21578 for comparison. The method was able to detect certain types of similar and/or duplicated documents that were overlooked by an alternative repetition-based method. Experiments showed that effectiveness was not affected by the confusing documents.

Original languageEnglish
Title of host publicationProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery (ACM)
Pages556-557
Number of pages2
ISBN (Print)1581138814, 9781581138818
DOIs
Publication statusPublished - 2004 Jan 1
EventProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - Sheffield, United Kingdom
Duration: 2004 Jul 252004 Jul 29

Publication series

NameProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

OtherProceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
CountryUnited Kingdom
CitySheffield
Period04/7/2504/7/29

Keywords

  • Chinese collection
  • Consistency verification
  • Duplicate detection

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'Verifying a Chinese collection for text categorization'. Together they form a unique fingerprint.

  • Cite this

    Tseng, Y. H., & Teahan, W. J. (2004). Verifying a Chinese collection for text categorization. In Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 556-557). (Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery (ACM). https://doi.org/10.1145/1008992.1009118