FROM CORPUS TO GRAMMAR: AUTOMATIC EXTRACTION OF GRAMMATICAL RELATIONS FROM ANNOTATED CORPUS

黃 居仁, 洪 嘉馡(Jia-Fei Hong), 馬 偉雲(Wei-Yun Ma), 石 穆(Petr Simon)

研究成果: 雜誌貢獻期刊論文同行評審

摘要

Automatic extraction of grammatical knowledge from corpora has been one of the ultimate goals and challenges of corpus linguistics. We present in this paper one of the approaches to this challenge in Chinese corpus linguistics by introducing our recent work using the Sketch Engine (SkE, also known as Word Sketch Engine) platform to automatically extract grammatical relations from PoS-annotated Chinese corpora. The SkE approach requires both giga-word size corpora and comprehensive lexico-grammatical information of the language in question. On the one hand, corpus size is crucial as the automatic extraction of grammatical relations requires enough instances of the relation pairs, which in turn require an exponential jump from the million-word size corpus for observation of single lexical items. On the other hand, lexico-grammatical information is crucial to the identification of potential relational pairs based on local context. The quality of such extraction is dependent on the quality of available lexico-grammatical knowledge. We show that a comprehensive lexical grammar, based on Information-based Case Grammar (Chen & Huang 1990) and covering over 40 thousand verbs greatly help the accuracy and recall of grammatical relation detection. The paper concludes by underlining the importance of integrating existing grammatical information to meet the challenge of automatic extraction of grammatical knowledge from large corpora.
原文英語
頁(從 - 到)192-221
頁數30
期刊Journal of Chinese Linguistics Monograph Series
發行號25
出版狀態已發佈 - 2015

指紋

深入研究「FROM CORPUS TO GRAMMAR: AUTOMATIC EXTRACTION OF GRAMMATICAL RELATIONS FROM ANNOTATED CORPUS」主題。共同形成了獨特的指紋。

引用此