Transliteration retrieval model for cross lingual information retrieval

Ea Ee Jan, Shih Hsiang Lin, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50% errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings
Pages183-192
Number of pages10
DOIs
Publication statusPublished - 2010 Dec 1
Event6th Asia Information Retrieval Societies Conference, AIRS 2010 - Taipei, Taiwan
Duration: 2010 Dec 12010 Dec 3

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6458 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th Asia Information Retrieval Societies Conference, AIRS 2010
CountryTaiwan
CityTaipei
Period10/12/110/12/3

Fingerprint

Information retrieval
Information Retrieval
Retrieval
Hidden Markov models
Query
Statistical Machine Translation
Target
Robust Methods
Scoring
Degradation
Markov Model
Alignment
Flexibility
Experimental Data
Model
Model-based
Module
Language
Similarity

Keywords

  • NTCIR
  • cross lingual information retrieval (CLIR)
  • retrieval model
  • statistical machine translation (SMT)
  • transliteration

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Jan, E. E., Lin, S. H., & Chen, B. (2010). Transliteration retrieval model for cross lingual information retrieval. In Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings (pp. 183-192). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6458 LNCS). https://doi.org/10.1007/978-3-642-17187-1_17

Transliteration retrieval model for cross lingual information retrieval. / Jan, Ea Ee; Lin, Shih Hsiang; Chen, Berlin.

Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings. 2010. p. 183-192 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6458 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jan, EE, Lin, SH & Chen, B 2010, Transliteration retrieval model for cross lingual information retrieval. in Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6458 LNCS, pp. 183-192, 6th Asia Information Retrieval Societies Conference, AIRS 2010, Taipei, Taiwan, 10/12/1. https://doi.org/10.1007/978-3-642-17187-1_17
Jan EE, Lin SH, Chen B. Transliteration retrieval model for cross lingual information retrieval. In Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings. 2010. p. 183-192. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-17187-1_17
Jan, Ea Ee ; Lin, Shih Hsiang ; Chen, Berlin. / Transliteration retrieval model for cross lingual information retrieval. Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings. 2010. pp. 183-192 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{dafc253fcdff48b18309dacf757ac414,
title = "Transliteration retrieval model for cross lingual information retrieval",
abstract = "The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50{\%} errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.",
keywords = "NTCIR, cross lingual information retrieval (CLIR), retrieval model, statistical machine translation (SMT), transliteration",
author = "Jan, {Ea Ee} and Lin, {Shih Hsiang} and Berlin Chen",
year = "2010",
month = "12",
day = "1",
doi = "10.1007/978-3-642-17187-1_17",
language = "English",
isbn = "3642171869",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "183--192",
booktitle = "Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings",

}

TY - GEN

T1 - Transliteration retrieval model for cross lingual information retrieval

AU - Jan, Ea Ee

AU - Lin, Shih Hsiang

AU - Chen, Berlin

PY - 2010/12/1

Y1 - 2010/12/1

N2 - The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50% errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.

AB - The performance of transliteration from a source language to a target language builds the ground work in support of proper name Cross Lingual Information Retrieval (CLIR). Traditionally, this task is accomplished by two separate modules: transliteration and retrieval. Queries are first transliterated to target language using one or multiple hypotheses. The retrieval is then carried out based on translated queries. The transliteration often results in 30-50% errors with top 1 hypothesis, thus leading to significant performance degradation in CLIR. Therefore, we proposed a unified transliteration retrieval model that incorporates the transliteration similarity measurement into the relevance scoring function. In addition, we presented an efficient and robust method in similarity measurement for a given proper name pair using the Hidden Markov Model (HMM) based alignment and a Statistical Machine Translation (SMT) framework. Experimental data showed significant results with the proposed integrated method on the NTCIR7 IR4QA task, which demonstrated a greater flexibility and acceptance in transliteration.

KW - NTCIR

KW - cross lingual information retrieval (CLIR)

KW - retrieval model

KW - statistical machine translation (SMT)

KW - transliteration

UR - http://www.scopus.com/inward/record.url?scp=78650878733&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650878733&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-17187-1_17

DO - 10.1007/978-3-642-17187-1_17

M3 - Conference contribution

AN - SCOPUS:78650878733

SN - 3642171869

SN - 9783642171864

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 183

EP - 192

BT - Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Proceedings

ER -