Mandarin-English Information (MEI): Investigating translingual speech retrieval

Helen M. Meng*, Berlin Chen, Sanjeev Khudanpur, Gina Anne Levow, Wai Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin Min Wang, Jianqiang Wang


研究成果: 雜誌貢獻期刊論文同行評審

25 引文 斯高帕斯(Scopus)


This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.

頁(從 - 到)163-179
期刊Computer Speech and Language
出版狀態已發佈 - 2004 4月

ASJC Scopus subject areas

  • 軟體
  • 理論電腦科學
  • 人機介面


深入研究「Mandarin-English Information (MEI): Investigating translingual speech retrieval」主題。共同形成了獨特的指紋。