A corpus-based approach to the discovery of cross-strait lexical contrasts

Jia Fei Hong, Chu Ren Huang

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Studies of cross-strait lexical contrasts in the use of Mandarin Chinese reveal that a divergence has become increasingly evident. This divergence is apparent in phonological, semantic, and pragmatic analyses and has become an obstacle to knowledge-sharing and information exchange. Given the wide range of divergences, it seems that Chinese character forms offer the most reliable regular mapping between cross-strait usage contrasts. We propose a new approach to discovery of cross-strait contrasts in this paper anchored on the regular correspondences of characters. Our approach is corpus-based and collocation-driven. We use known contrast pairs as seeds in a corpus containing data from both the PRC and Taiwan. Collocation patterns in terms of both lexical lists and grammatical functions of these contrast pairs are studied to semi-automatically discover additional contrast pairs. This approach obtains both NLP applicability and linguistic felicity since it yields both the contrast pairs as well as their usage contexts.

Original languageEnglish
Pages (from-to)221-238
Number of pages18
JournalLanguage and Linguistics
Volume9
Issue number2
Publication statusPublished - 2008 Dec 1
Externally publishedYes

    Fingerprint

Keywords

  • Chinese word sketch
  • Collocation
  • Cross-strait lexical contrasts
  • Gigaword corpus

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this