Abstract
Studies of cross-strait lexical contrasts in the use of Mandarin Chinese reveal that a divergence has become increasingly evident. This divergence is apparent in phonological, semantic, and pragmatic analyses and has become an obstacle to knowledge-sharing and information exchange. Given the wide range of divergences, it seems that Chinese character forms offer the most reliable regular mapping between cross-strait usage contrasts. We propose a new approach to discovery of cross-strait contrasts in this paper anchored on the regular correspondences of characters. Our approach is corpus-based and collocation-driven. We use known contrast pairs as seeds in a corpus containing data from both the PRC and Taiwan. Collocation patterns in terms of both lexical lists and grammatical functions of these contrast pairs are studied to semi-automatically discover additional contrast pairs. This approach obtains both NLP applicability and linguistic felicity since it yields both the contrast pairs as well as their usage contexts.
Original language | English |
---|---|
Pages (from-to) | 221-238 |
Number of pages | 18 |
Journal | Language and Linguistics |
Volume | 9 |
Issue number | 2 |
Publication status | Published - 2008 |
Externally published | Yes |
Keywords
- Chinese word sketch
- Collocation
- Cross-strait lexical contrasts
- Gigaword corpus
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language