TY - GEN
T1 - New word extraction utilizing google news corpuses for supporting lexicon-based chinese word segmentation systems
AU - Hong, Chin Ming
AU - Chen, Chih Ming
AU - Chiu, Chao Yang
PY - 2006
Y1 - 2006
N2 - This study proposes a novel statistics-based scheme for new word extraction based on Google news to promote the word identification ability for the lexicon-based Chinese word segmentation systems. To extract news words from the corpuses of news and incrementally add them into the lexicon for the lexicon-based Chinese word segmentation systems provides benefits in terms of automatically constructing a professional lexicon of news and enhancing word identification ability. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.
AB - This study proposes a novel statistics-based scheme for new word extraction based on Google news to promote the word identification ability for the lexicon-based Chinese word segmentation systems. To extract news words from the corpuses of news and incrementally add them into the lexicon for the lexicon-based Chinese word segmentation systems provides benefits in terms of automatically constructing a professional lexicon of news and enhancing word identification ability. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.
KW - Chinese word segmentation
KW - Information retrieval
KW - Natural language processing
KW - New word extraction
UR - http://www.scopus.com/inward/record.url?scp=40649127959&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=40649127959&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:40649127959
SN - 0780394909
SN - 9780780394902
T3 - IEEE International Conference on Neural Networks - Conference Proceedings
SP - 3040
EP - 3046
BT - International Joint Conference on Neural Networks 2006, IJCNN '06
T2 - International Joint Conference on Neural Networks 2006, IJCNN '06
Y2 - 16 July 2006 through 21 July 2006
ER -