TY - JOUR
T1 - Mandarin-English Information (MEI)
T2 - Investigating translingual speech retrieval
AU - Meng, Helen M.
AU - Chen, Berlin
AU - Khudanpur, Sanjeev
AU - Levow, Gina Anne
AU - Lo, Wai Kit
AU - Oard, Douglas
AU - Schone, Patrick
AU - Tang, Karen
AU - Wang, Hsin Min
AU - Wang, Jianqiang
PY - 2004/4
Y1 - 2004/4
N2 - This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.
AB - This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.
KW - English-Chinese cross-language spoken document retrieval
KW - Multi-scale spoken document retrieval
UR - http://www.scopus.com/inward/record.url?scp=12144286470&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=12144286470&partnerID=8YFLogxK
U2 - 10.1016/j.csl.2003.09.003
DO - 10.1016/j.csl.2003.09.003
M3 - Article
AN - SCOPUS:12144286470
VL - 18
SP - 163
EP - 179
JO - Computer Speech and Language
JF - Computer Speech and Language
SN - 0885-2308
IS - 2
ER -