Identification of code-switched sentences and words using language modeling approaches

Liang Chih Yu, Wei Cheng He, Wei Nan Chien, Yuen Hsien Tseng

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Globalization and multilingualism contribute to code-switching - the phenomenon in which speakers produce utterances containing words or expressions from a second language. Processing code-switched sentences is a significant challenge for multilingual intelligent systems. This study proposes a language modeling approach to the problem of code-switching language processing, dividing the problem into two subtasks: the detection of code-switched sentences and the identification of code-switched words in sentences. A code-switched sentence is detected on the basis of whether it contains words or phrases from another language. Once the code-switched sentences are identified, the positions of the code-switched words in the sentences are then identified. Experimental results show that the language modeling approach achieved an F -measure of 80.43% and an accuracy of 79.01% for detecting Mandarin-Taiwanese code-switched sentences. For the identification of code-switched words, the word-based and POS-based models, respectively, achieved F -measures of 41.09% and 53.08%.

Original languageEnglish
Article number898714
JournalMathematical Problems in Engineering
Volume2013
DOIs
Publication statusPublished - 2013 Oct 21

Fingerprint

Language Modeling
Intelligent systems
Processing
Globalization
Intelligent Systems

ASJC Scopus subject areas

  • Mathematics(all)
  • Engineering(all)

Cite this

Identification of code-switched sentences and words using language modeling approaches. / Yu, Liang Chih; He, Wei Cheng; Chien, Wei Nan; Tseng, Yuen Hsien.

In: Mathematical Problems in Engineering, Vol. 2013, 898714, 21.10.2013.

Research output: Contribution to journalArticle

@article{f6990d2230fb495eb1f58b36e8a20d89,
title = "Identification of code-switched sentences and words using language modeling approaches",
abstract = "Globalization and multilingualism contribute to code-switching - the phenomenon in which speakers produce utterances containing words or expressions from a second language. Processing code-switched sentences is a significant challenge for multilingual intelligent systems. This study proposes a language modeling approach to the problem of code-switching language processing, dividing the problem into two subtasks: the detection of code-switched sentences and the identification of code-switched words in sentences. A code-switched sentence is detected on the basis of whether it contains words or phrases from another language. Once the code-switched sentences are identified, the positions of the code-switched words in the sentences are then identified. Experimental results show that the language modeling approach achieved an F -measure of 80.43{\%} and an accuracy of 79.01{\%} for detecting Mandarin-Taiwanese code-switched sentences. For the identification of code-switched words, the word-based and POS-based models, respectively, achieved F -measures of 41.09{\%} and 53.08{\%}.",
author = "Yu, {Liang Chih} and He, {Wei Cheng} and Chien, {Wei Nan} and Tseng, {Yuen Hsien}",
year = "2013",
month = "10",
day = "21",
doi = "10.1155/2013/898714",
language = "English",
volume = "2013",
journal = "Mathematical Problems in Engineering",
issn = "1024-123X",
publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Identification of code-switched sentences and words using language modeling approaches

AU - Yu, Liang Chih

AU - He, Wei Cheng

AU - Chien, Wei Nan

AU - Tseng, Yuen Hsien

PY - 2013/10/21

Y1 - 2013/10/21

N2 - Globalization and multilingualism contribute to code-switching - the phenomenon in which speakers produce utterances containing words or expressions from a second language. Processing code-switched sentences is a significant challenge for multilingual intelligent systems. This study proposes a language modeling approach to the problem of code-switching language processing, dividing the problem into two subtasks: the detection of code-switched sentences and the identification of code-switched words in sentences. A code-switched sentence is detected on the basis of whether it contains words or phrases from another language. Once the code-switched sentences are identified, the positions of the code-switched words in the sentences are then identified. Experimental results show that the language modeling approach achieved an F -measure of 80.43% and an accuracy of 79.01% for detecting Mandarin-Taiwanese code-switched sentences. For the identification of code-switched words, the word-based and POS-based models, respectively, achieved F -measures of 41.09% and 53.08%.

AB - Globalization and multilingualism contribute to code-switching - the phenomenon in which speakers produce utterances containing words or expressions from a second language. Processing code-switched sentences is a significant challenge for multilingual intelligent systems. This study proposes a language modeling approach to the problem of code-switching language processing, dividing the problem into two subtasks: the detection of code-switched sentences and the identification of code-switched words in sentences. A code-switched sentence is detected on the basis of whether it contains words or phrases from another language. Once the code-switched sentences are identified, the positions of the code-switched words in the sentences are then identified. Experimental results show that the language modeling approach achieved an F -measure of 80.43% and an accuracy of 79.01% for detecting Mandarin-Taiwanese code-switched sentences. For the identification of code-switched words, the word-based and POS-based models, respectively, achieved F -measures of 41.09% and 53.08%.

UR - http://www.scopus.com/inward/record.url?scp=84885592674&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885592674&partnerID=8YFLogxK

U2 - 10.1155/2013/898714

DO - 10.1155/2013/898714

M3 - Article

AN - SCOPUS:84885592674

VL - 2013

JO - Mathematical Problems in Engineering

JF - Mathematical Problems in Engineering

SN - 1024-123X

M1 - 898714

ER -