TY - JOUR
T1 - Exploring the use of latent topical information for statistical Chinese spoken document retrieval
AU - Chen, Berlin
PY - 2006/1/1
Y1 - 2006/1/1
N2 - Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.
AB - Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.
KW - HMM/N-gram retrieval model
KW - Information retrieval
KW - Latent semantic indexing model
KW - Probabilistic latent semantic analysis model
KW - Topical mixture model
KW - Vector space model
UR - http://www.scopus.com/inward/record.url?scp=27744494029&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=27744494029&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2005.06.010
DO - 10.1016/j.patrec.2005.06.010
M3 - Article
AN - SCOPUS:27744494029
SN - 0167-8655
VL - 27
SP - 9
EP - 18
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
IS - 1
ER -