Exploring the use of latent topical information for statistical Chinese spoken document retrieval

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.

Original languageEnglish
Pages (from-to)9-18
Number of pages10
JournalPattern Recognition Letters
Volume27
Issue number1
DOIs
Publication statusPublished - 2006 Jan 1

Fingerprint

Information retrieval
Semantics
Vector spaces
Model structures
Experiments

Keywords

  • HMM/N-gram retrieval model
  • Information retrieval
  • Latent semantic indexing model
  • Probabilistic latent semantic analysis model
  • Topical mixture model
  • Vector space model

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Exploring the use of latent topical information for statistical Chinese spoken document retrieval. / Chen, Berlin.

In: Pattern Recognition Letters, Vol. 27, No. 1, 01.01.2006, p. 9-18.

Research output: Contribution to journalArticle

@article{730a36f5332f4cf99f4d2c5ad6e905ef,
title = "Exploring the use of latent topical information for statistical Chinese spoken document retrieval",
abstract = "Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.",
keywords = "HMM/N-gram retrieval model, Information retrieval, Latent semantic indexing model, Probabilistic latent semantic analysis model, Topical mixture model, Vector space model",
author = "Berlin Chen",
year = "2006",
month = "1",
day = "1",
doi = "10.1016/j.patrec.2005.06.010",
language = "English",
volume = "27",
pages = "9--18",
journal = "Pattern Recognition Letters",
issn = "0167-8655",
publisher = "Elsevier",
number = "1",

}

TY - JOUR

T1 - Exploring the use of latent topical information for statistical Chinese spoken document retrieval

AU - Chen, Berlin

PY - 2006/1/1

Y1 - 2006/1/1

N2 - Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.

AB - Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper, we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the probabilistic latent semantic analysis model, vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT Chinese collections (TDT-2 and TDT-3). Noticeable improvements in retrieval performance were obtained.

KW - HMM/N-gram retrieval model

KW - Information retrieval

KW - Latent semantic indexing model

KW - Probabilistic latent semantic analysis model

KW - Topical mixture model

KW - Vector space model

UR - http://www.scopus.com/inward/record.url?scp=27744494029&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27744494029&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2005.06.010

DO - 10.1016/j.patrec.2005.06.010

M3 - Article

AN - SCOPUS:27744494029

VL - 27

SP - 9

EP - 18

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

SN - 0167-8655

IS - 1

ER -