Identification of code-switched sentences and words using language modeling approaches

Liang Chih Yu*, Wei Cheng He, Wei Nan Chien, Yuen Hsien Tseng

*此作品的通信作者

研究成果: 雜誌貢獻期刊論文同行評審

4 引文 斯高帕斯(Scopus)

摘要

Globalization and multilingualism contribute to code-switching - the phenomenon in which speakers produce utterances containing words or expressions from a second language. Processing code-switched sentences is a significant challenge for multilingual intelligent systems. This study proposes a language modeling approach to the problem of code-switching language processing, dividing the problem into two subtasks: the detection of code-switched sentences and the identification of code-switched words in sentences. A code-switched sentence is detected on the basis of whether it contains words or phrases from another language. Once the code-switched sentences are identified, the positions of the code-switched words in the sentences are then identified. Experimental results show that the language modeling approach achieved an F -measure of 80.43% and an accuracy of 79.01% for detecting Mandarin-Taiwanese code-switched sentences. For the identification of code-switched words, the word-based and POS-based models, respectively, achieved F -measures of 41.09% and 53.08%.

原文英語
文章編號898714
期刊Mathematical Problems in Engineering
2013
DOIs
出版狀態已發佈 - 2013

ASJC Scopus subject areas

  • 一般數學
  • 一般工程

指紋

深入研究「Identification of code-switched sentences and words using language modeling approaches」主題。共同形成了獨特的指紋。

引用此