TY - JOUR
T1 - Multi-scale audio indexing for translingual spoken document retrieval
AU - Wang, H.
AU - Meng, H.
AU - Schone, P.
AU - Chen, B.
AU - Lo, W. K.
PY - 2001
Y1 - 2001
N2 - MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) are used in retrieval. The use of subword units can complement the word unit in handling the problems of Chinese word tokenization ambiguity, Chinese homophone ambiguity, and out-of-vocabulary words in audio indexing. This paper focuses on multi-scale audio indexing in MEI. Experiments are based on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3), where we indexed Voice of America Mandarin news broadcasts by speech recognition on both the word and subword scales. In this paper, we discuss the development of the MEI syllable recognizer, the representations of spoken documents using overlapping subword n-grams and lattice structures. Results show that augmenting words with subwords is beneficial to CL-SDR performance.
AB - MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) are used in retrieval. The use of subword units can complement the word unit in handling the problems of Chinese word tokenization ambiguity, Chinese homophone ambiguity, and out-of-vocabulary words in audio indexing. This paper focuses on multi-scale audio indexing in MEI. Experiments are based on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3), where we indexed Voice of America Mandarin news broadcasts by speech recognition on both the word and subword scales. In this paper, we discuss the development of the MEI syllable recognizer, the representations of spoken documents using overlapping subword n-grams and lattice structures. Results show that augmenting words with subwords is beneficial to CL-SDR performance.
UR - http://www.scopus.com/inward/record.url?scp=0034852839&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0034852839&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:0034852839
SN - 1520-6149
VL - 1
SP - 605
EP - 608
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing
Y2 - 7 May 2001 through 11 May 2001
ER -