TY - JOUR
T1 - A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents
AU - Chen, Berlin
AU - Wang, Hsin Min
AU - Lee, Lin Shan
PY - 2004/6
Y1 - 2004/6
N2 - In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.
AB - In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.
KW - Hidden Markov models
KW - Mandarin spoken documents
KW - Syllable-level Indexing features
UR - http://www.scopus.com/inward/record.url?scp=10044234199&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=10044234199&partnerID=8YFLogxK
U2 - 10.1145/1034780.1034784
DO - 10.1145/1034780.1034784
M3 - Article
AN - SCOPUS:10044234199
VL - 3
SP - 128
EP - 145
JO - ACM Transactions on Asian Language Information Processing
JF - ACM Transactions on Asian Language Information Processing
SN - 1530-0226
IS - 2
ER -