Statistical Chinese spoken document retrieval using latent topical information

Berlin Chen, Jen Wei Kuo, Yao Min Huang, Hsin Min Wang

Research output: Contribution to conferencePaper

4 Citations (Scopus)

Abstract

Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT-2 Chinese collection. Noticeable improvements in retrieval performance were obtained.

Original languageEnglish
Pages1621-1624
Number of pages4
Publication statusPublished - 2004 Jan 1
Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
Duration: 2004 Oct 42004 Oct 8

Other

Other8th International Conference on Spoken Language Processing, ICSLP 2004
CountryKorea, Republic of
CityJeju, Jeju Island
Period04/10/404/10/8

Fingerprint

information retrieval
performance
indexing
mismatch
vocabulary
semantics
experiment
learning
Information Retrieval
Vector Space Model
N-gram
Mixture Model
Conventional
Mismatch
Vocabulary
Experiment
Hidden Markov Model
Indexing

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Chen, B., Kuo, J. W., Huang, Y. M., & Wang, H. M. (2004). Statistical Chinese spoken document retrieval using latent topical information. 1621-1624. Paper presented at 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of.

Statistical Chinese spoken document retrieval using latent topical information. / Chen, Berlin; Kuo, Jen Wei; Huang, Yao Min; Wang, Hsin Min.

2004. 1621-1624 Paper presented at 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of.

Research output: Contribution to conferencePaper

Chen, B, Kuo, JW, Huang, YM & Wang, HM 2004, 'Statistical Chinese spoken document retrieval using latent topical information', Paper presented at 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of, 04/10/4 - 04/10/8 pp. 1621-1624.
Chen B, Kuo JW, Huang YM, Wang HM. Statistical Chinese spoken document retrieval using latent topical information. 2004. Paper presented at 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of.
Chen, Berlin ; Kuo, Jen Wei ; Huang, Yao Min ; Wang, Hsin Min. / Statistical Chinese spoken document retrieval using latent topical information. Paper presented at 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of.4 p.
@conference{9630abce46ea4303919910dd49359afc,
title = "Statistical Chinese spoken document retrieval using latent topical information",
abstract = "Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT-2 Chinese collection. Noticeable improvements in retrieval performance were obtained.",
author = "Berlin Chen and Kuo, {Jen Wei} and Huang, {Yao Min} and Wang, {Hsin Min}",
year = "2004",
month = "1",
day = "1",
language = "English",
pages = "1621--1624",
note = "8th International Conference on Spoken Language Processing, ICSLP 2004 ; Conference date: 04-10-2004 Through 08-10-2004",

}

TY - CONF

T1 - Statistical Chinese spoken document retrieval using latent topical information

AU - Chen, Berlin

AU - Kuo, Jen Wei

AU - Huang, Yao Min

AU - Wang, Hsin Min

PY - 2004/1/1

Y1 - 2004/1/1

N2 - Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT-2 Chinese collection. Noticeable improvements in retrieval performance were obtained.

AB - Information retrieval which aims to provide people with easy access to all kinds of information is now becoming more and more emphasized. However, most approaches to information retrieval are primarily based on literal term matching and operate in a deterministic manner. Thus their performance is often limited due to the problems of vocabulary mismatch and not able to be steadily improved through use. In order to overcome these drawbacks as well as to enhance the retrieval performance, in this paper we explore the use of topical mixture model for statistical Chinese spoken document retrieval. Various kinds of model structures and learning approaches were extensively investigated. In addition, the retrieval capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as our previously presented HMM/N-gram retrieval model. The experiments were performed on the TDT-2 Chinese collection. Noticeable improvements in retrieval performance were obtained.

UR - http://www.scopus.com/inward/record.url?scp=85009074689&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009074689&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85009074689

SP - 1621

EP - 1624

ER -