Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems

Chu Ren Huang, Lung Hao Lee, Wei Guang Qu, Jia Fei Hong, Shiwen Yu

研究成果: 書貢獻/報告類型會議論文篇章

9 引文 斯高帕斯(Scopus)

摘要

We propose a set of heuristics for improving annotation quality of very large corpora efficiently. The Xinhua News portion of the Chinese Gigaword Corpus was tagged independently with both the Peking University ICL tagset and the Academia Sinica CKIP tagset. The corpus-based POS tags mapping will serve as the basis of the possible contrast in grammatical systems between PRC and Taiwan. And it can serve as the basic model for mapping between the CKIP and ICL tagging systems for any data.

原文英語
主出版物標題Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
發行者European Language Resources Association (ELRA)
頁面2725-2729
頁數5
ISBN(電子)2951740840, 9782951740846
出版狀態已發佈 - 2008 一月 1
對外發佈
事件6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, 摩洛哥
持續時間: 2008 五月 282008 五月 30

出版系列

名字Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

會議

會議6th International Conference on Language Resources and Evaluation, LREC 2008
國家/地區摩洛哥
城市Marrakech
期間2008/05/282008/05/30

ASJC Scopus subject areas

  • 圖書館與資訊科學
  • 語言和語言學
  • 語言與語言學
  • 教育

指紋

深入研究「Quality assurance of automatic annotation of very large corpora: A study based on heterogeneous tagging systems」主題。共同形成了獨特的指紋。

引用此