TY - GEN
T1 - An Effective and Robust Framework for Transliteration Exploration
AU - Jan, Ea Ee
AU - Ge, Niyu
AU - Lin, Shih Hsiang
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2011 AFNLP
PY - 2011
Y1 - 2011
N2 - Transliteration is the process of proper name translation based on pronunciation. It is an important process in many multilingual natural language tasks. A common and essential component of transliteration approaches is a verification mechanism that tests if the two names in different languages are translations of each other. Although many transliteration systems have verification as a component, verification as a stand-alone problem is relatively new. In this paper, we propose a simple, effective and robust training framework for the task of verification. We show the many applications of the verification techniques. Our proposed method can operate on both phonemic and orthographic inputs. Our best results show that a simple, straightforward orthographic representation is sufficient and no complex training method is needed. It is effective because it achieves remarkable accuracies. It is robust because it is language-independent. We show that on Chinese and Korean our technique achieves equal error rate well below 1% and around 1% for Japanese using 2009 and 2010 NEWS transliteration generation share task dataset. Our results also show that the orthographic system outperforms the phonemic system. This is especially encouraging because the orthographic inputs are easier to generate and secondly, one does not need to resort to more complex training algorithm to achieve excellent results. This approach is integrated for proper name based cross lingual information retrieval without translation.
AB - Transliteration is the process of proper name translation based on pronunciation. It is an important process in many multilingual natural language tasks. A common and essential component of transliteration approaches is a verification mechanism that tests if the two names in different languages are translations of each other. Although many transliteration systems have verification as a component, verification as a stand-alone problem is relatively new. In this paper, we propose a simple, effective and robust training framework for the task of verification. We show the many applications of the verification techniques. Our proposed method can operate on both phonemic and orthographic inputs. Our best results show that a simple, straightforward orthographic representation is sufficient and no complex training method is needed. It is effective because it achieves remarkable accuracies. It is robust because it is language-independent. We show that on Chinese and Korean our technique achieves equal error rate well below 1% and around 1% for Japanese using 2009 and 2010 NEWS transliteration generation share task dataset. Our results also show that the orthographic system outperforms the phonemic system. This is especially encouraging because the orthographic inputs are easier to generate and secondly, one does not need to resort to more complex training algorithm to achieve excellent results. This approach is integrated for proper name based cross lingual information retrieval without translation.
UR - http://www.scopus.com/inward/record.url?scp=85123605034&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123605034&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85123605034
T3 - IJCNLP 2011 - Proceedings of the 5th International Joint Conference on Natural Language Processing
SP - 1332
EP - 1340
BT - IJCNLP 2011 - Proceedings of the 5th International Joint Conference on Natural Language Processing
A2 - Wang, Haifeng
A2 - Yarowsky, David
PB - Association for Computational Linguistics (ACL)
T2 - 5th International Joint Conference on Natural Language Processing, IJCNLP 2011
Y2 - 8 November 2011 through 13 November 2011
ER -