Automatic extraction of grammatical knowledge from corpora has been one of the ultimate goals and challenges of corpus linguistics. We present in this paper one of the approaches to this challenge in Chinese corpus linguistics by introducing our recent work using the Sketch Engine (SkE, also known as Word Sketch Engine) platform to automatically extract grammatical relations from PoS-annotated Chinese corpora. The SkE approach requires both giga-word size corpora and comprehensive lexico-grammatical information of the language in question. On the one hand, corpus size is crucial as the automatic extraction of grammatical relations requires enough instances of the relation pairs, which in turn require an exponential jump from the million-word size corpus for observation of single lexical items. On the other hand, lexico-grammatical information is crucial to the identification of potential relational pairs based on local context. The quality of such extraction is dependent on the quality of available lexico-grammatical knowledge. We show that a comprehensive lexical grammar, based on Information-based Case Grammar (Chen & Huang 1990) and covering over 40 thousand verbs greatly help the accuracy and recall of grammatical relation detection. The paper concludes by underlining the importance of integrating existing grammatical information to meet the challenge of automatic extraction of grammatical knowledge from large corpora.
|Number of pages||30|
|Journal||Journal of Chinese Linguistics Monograph Series|
|Publication status||Published - 2015|
- Mandarin Chinese
- Grammatical knowledge
- Automatic extraction
- Lexical grammar
- Sketch engin
黃居仁, 洪嘉馡(Jia-Fei H, 馬偉雲(Wei-Yun M, & 石穆(Petr S (2015). FROM CORPUS TO GRAMMAR: AUTOMATIC EXTRACTION OF GRAMMATICAL RELATIONS FROM ANNOTATED CORPUS. Journal of Chinese Linguistics Monograph Series, (25), 192-221.