A corpus-based approach to the discovery of cross-strait lexical contrasts

Jia Fei Hong*, Chu Ren Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)


Studies of cross-strait lexical contrasts in the use of Mandarin Chinese reveal that a divergence has become increasingly evident. This divergence is apparent in phonological, semantic, and pragmatic analyses and has become an obstacle to knowledge-sharing and information exchange. Given the wide range of divergences, it seems that Chinese character forms offer the most reliable regular mapping between cross-strait usage contrasts. We propose a new approach to discovery of cross-strait contrasts in this paper anchored on the regular correspondences of characters. Our approach is corpus-based and collocation-driven. We use known contrast pairs as seeds in a corpus containing data from both the PRC and Taiwan. Collocation patterns in terms of both lexical lists and grammatical functions of these contrast pairs are studied to semi-automatically discover additional contrast pairs. This approach obtains both NLP applicability and linguistic felicity since it yields both the contrast pairs as well as their usage contexts.

Original languageEnglish
Pages (from-to)221-238
Number of pages18
JournalLanguage and Linguistics
Issue number2
Publication statusPublished - 2008
Externally publishedYes


  • Chinese word sketch
  • Collocation
  • Cross-strait lexical contrasts
  • Gigaword corpus

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language


Dive into the research topics of 'A corpus-based approach to the discovery of cross-strait lexical contrasts'. Together they form a unique fingerprint.

Cite this