Mandarin-English Information (MEI): Investigating translingual speech retrieval

  • Helen M. Meng*
  • , Berlin Chen
  • , Sanjeev Khudanpur
  • , Gina Anne Levow
  • , Wai Kit Lo
  • , Douglas Oard
  • , Patrick Schone
  • , Karen Tang
  • , Hsin Min Wang
  • , Jianqiang Wang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)

Abstract

This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.

Original languageEnglish
Pages (from-to)163-179
Number of pages17
JournalComputer Speech and Language
Volume18
Issue number2
DOIs
Publication statusPublished - 2004 Apr
Externally publishedYes

Keywords

  • English-Chinese cross-language spoken document retrieval
  • Multi-scale spoken document retrieval

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Mandarin-English Information (MEI): Investigating translingual speech retrieval'. Together they form a unique fingerprint.

Cite this