Design and prototype of a large-scale and fully sense-tagged corpus

Sue Jin Ker*, Chu Ren Huang, Jia Fei Hong, Shi Yin Liu, Hui Ling Jian, I. Li Su, Shu Kai Hsieh

*此作品的通信作者

研究成果: 書貢獻/報告類型會議論文篇章

2 引文 斯高帕斯(Scopus)

摘要

Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text sense tagged Corpus, which contains over 110,000 words. The Academia Sinica Balanced Corpus of Modern Chinese (also named Sinica Corpus) is treated as the tagging object, and there are 56 full texts extracted from this corpus. By using the N-gram statistics and the information of collocation, the preparation work for automatic sense tagging is planned by combining the techniques and methods of machine learning and the probability model. In order to achieve a highly precise result, the result of automatic sense tagging needs the touch of manual revising.

原文英語
主出版物標題Large-Scale Knowledge Resources
主出版物子標題Construction and Application - Third International Conference on Large-Scale Knowledge Resources, LKR 2008, Proceedings
頁面186-193
頁數8
DOIs
出版狀態已發佈 - 2008
對外發佈
事件3rd International Conference on Large-Scale Knowledge Resources, LKR 2008 - Tokyo, 日本
持續時間: 2008 3月 32008 3月 5

出版系列

名字Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
4938 LNAI
ISSN(列印)0302-9743
ISSN(電子)1611-3349

會議

會議3rd International Conference on Large-Scale Knowledge Resources, LKR 2008
國家/地區日本
城市Tokyo
期間2008/03/032008/03/05

ASJC Scopus subject areas

  • 理論電腦科學
  • 一般電腦科學

指紋

深入研究「Design and prototype of a large-scale and fully sense-tagged corpus」主題。共同形成了獨特的指紋。

引用此