Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

H. M. Meng, Wai Kit Lo, Berlin Chen, K. Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Citations (Scopus)

Abstract

We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.

Original languageEnglish
Title of host publication2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages311-314
Number of pages4
ISBN (Electronic)078037343X, 9780780373433
DOIs
Publication statusPublished - 2001 Jan 1
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Madonna di Campiglio, Italy
Duration: 2001 Dec 92001 Dec 13

Publication series

Name2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001
CountryItaly
CityMadonna di Campiglio
Period01/12/901/12/13

Fingerprint

Speech analysis
Information retrieval
Speech recognition
Glossaries
Engines
Experiments

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Meng, H. M., Lo, W. K., Chen, B., & Tang, K. (2001). Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings (pp. 311-314). [1034649] (2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2001.1034649

Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. / Meng, H. M.; Lo, Wai Kit; Chen, Berlin; Tang, K.

2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2001. p. 311-314 1034649 (2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Meng, HM, Lo, WK, Chen, B & Tang, K 2001, Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. in 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings., 1034649, 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 311-314, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001, Madonna di Campiglio, Italy, 01/12/9. https://doi.org/10.1109/ASRU.2001.1034649
Meng HM, Lo WK, Chen B, Tang K. Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc. 2001. p. 311-314. 1034649. (2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings). https://doi.org/10.1109/ASRU.2001.1034649
Meng, H. M. ; Lo, Wai Kit ; Chen, Berlin ; Tang, K. / Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2001. pp. 311-314 (2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings).
@inproceedings{2352997714404f318d2833dce84ce243,
title = "Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval",
abstract = "We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.",
author = "Meng, {H. M.} and Lo, {Wai Kit} and Berlin Chen and K. Tang",
year = "2001",
month = "1",
day = "1",
doi = "10.1109/ASRU.2001.1034649",
language = "English",
series = "2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "311--314",
booktitle = "2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings",

}

TY - GEN

T1 - Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

AU - Meng, H. M.

AU - Lo, Wai Kit

AU - Chen, Berlin

AU - Tang, K.

PY - 2001/1/1

Y1 - 2001/1/1

N2 - We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.

AB - We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.

UR - http://www.scopus.com/inward/record.url?scp=84962878581&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962878581&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2001.1034649

DO - 10.1109/ASRU.2001.1034649

M3 - Conference contribution

T3 - 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings

SP - 311

EP - 314

BT - 2001 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2001 - Conference Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -