New word extraction utilizing google news corpuses for supporting lexicon-based chinese word segmentation systems

Chin-Ming Hong, Chih Ming Chen, Chao Yang Chiu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This study proposes a novel statistics-based scheme for new word extraction based on Google news to promote the word identification ability for the lexicon-based Chinese word segmentation systems. To extract news words from the corpuses of news and incrementally add them into the lexicon for the lexicon-based Chinese word segmentation systems provides benefits in terms of automatically constructing a professional lexicon of news and enhancing word identification ability. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.

Original languageEnglish
Title of host publicationInternational Joint Conference on Neural Networks 2006, IJCNN '06
Pages3040-3046
Number of pages7
Publication statusPublished - 2006 Dec 1
EventInternational Joint Conference on Neural Networks 2006, IJCNN '06 - Vancouver, BC, Canada
Duration: 2006 Jul 162006 Jul 21

Publication series

NameIEEE International Conference on Neural Networks - Conference Proceedings
ISSN (Print)1098-7576

Other

OtherInternational Joint Conference on Neural Networks 2006, IJCNN '06
CountryCanada
CityVancouver, BC
Period06/7/1606/7/21

    Fingerprint

Keywords

  • Chinese word segmentation
  • Information retrieval
  • Natural language processing
  • New word extraction

ASJC Scopus subject areas

  • Software

Cite this

Hong, C-M., Chen, C. M., & Chiu, C. Y. (2006). New word extraction utilizing google news corpuses for supporting lexicon-based chinese word segmentation systems. In International Joint Conference on Neural Networks 2006, IJCNN '06 (pp. 3040-3046). [1716512] (IEEE International Conference on Neural Networks - Conference Proceedings).