Mandarin-English Information (MEI): Investigating translingual speech retrieval

Helen M. Meng, Berlin Chen, Sanjeev Khudanpur, Gina Anne Levow, Wai Kit Lo, Douglas Oard, Patrick Schone, Karen Tang, Hsin Min Wang, Jianqiang Wang

Research output: Contribution to journalArticle

21 Citations (Scopus)

Abstract

This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.

Original languageEnglish
Pages (from-to)163-179
Number of pages17
JournalComputer Speech and Language
Volume18
Issue number2
DOIs
Publication statusPublished - 2004 Jan 1

Fingerprint

Information retrieval systems
Glossaries
Speech recognition
Retrieval
Subword
Document Retrieval
Query
Speech Recognition
Indexing
Broadcast
Speech
Entire
Formulation
Experimental Results
Demonstrate
Language

Keywords

  • English-Chinese cross-language spoken document retrieval
  • Multi-scale spoken document retrieval

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Human-Computer Interaction

Cite this

Mandarin-English Information (MEI) : Investigating translingual speech retrieval. / Meng, Helen M.; Chen, Berlin; Khudanpur, Sanjeev; Levow, Gina Anne; Lo, Wai Kit; Oard, Douglas; Schone, Patrick; Tang, Karen; Wang, Hsin Min; Wang, Jianqiang.

In: Computer Speech and Language, Vol. 18, No. 2, 01.01.2004, p. 163-179.

Research output: Contribution to journalArticle

Meng, HM, Chen, B, Khudanpur, S, Levow, GA, Lo, WK, Oard, D, Schone, P, Tang, K, Wang, HM & Wang, J 2004, 'Mandarin-English Information (MEI): Investigating translingual speech retrieval', Computer Speech and Language, vol. 18, no. 2, pp. 163-179. https://doi.org/10.1016/j.csl.2003.09.003
Meng, Helen M. ; Chen, Berlin ; Khudanpur, Sanjeev ; Levow, Gina Anne ; Lo, Wai Kit ; Oard, Douglas ; Schone, Patrick ; Tang, Karen ; Wang, Hsin Min ; Wang, Jianqiang. / Mandarin-English Information (MEI) : Investigating translingual speech retrieval. In: Computer Speech and Language. 2004 ; Vol. 18, No. 2. pp. 163-179.
@article{3602cd12dab946039751079a18bdc742,
title = "Mandarin-English Information (MEI): Investigating translingual speech retrieval",
abstract = "This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.",
keywords = "English-Chinese cross-language spoken document retrieval, Multi-scale spoken document retrieval",
author = "Meng, {Helen M.} and Berlin Chen and Sanjeev Khudanpur and Levow, {Gina Anne} and Lo, {Wai Kit} and Douglas Oard and Patrick Schone and Karen Tang and Wang, {Hsin Min} and Jianqiang Wang",
year = "2004",
month = "1",
day = "1",
doi = "10.1016/j.csl.2003.09.003",
language = "English",
volume = "18",
pages = "163--179",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - Mandarin-English Information (MEI)

T2 - Investigating translingual speech retrieval

AU - Meng, Helen M.

AU - Chen, Berlin

AU - Khudanpur, Sanjeev

AU - Levow, Gina Anne

AU - Lo, Wai Kit

AU - Oard, Douglas

AU - Schone, Patrick

AU - Tang, Karen

AU - Wang, Hsin Min

AU - Wang, Jianqiang

PY - 2004/1/1

Y1 - 2004/1/1

N2 - This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.

AB - This paper describes the Mandarin-English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English-Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks - multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental-results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.

KW - English-Chinese cross-language spoken document retrieval

KW - Multi-scale spoken document retrieval

UR - http://www.scopus.com/inward/record.url?scp=12144286470&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12144286470&partnerID=8YFLogxK

U2 - 10.1016/j.csl.2003.09.003

DO - 10.1016/j.csl.2003.09.003

M3 - Article

AN - SCOPUS:12144286470

VL - 18

SP - 163

EP - 179

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

IS - 2

ER -