Latent topic modeling of word co-occurrence information for spoken document retrieval

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Citations (Scopus)

Abstract

In this paper, we present a word topic model (WTM) approach, discovering the co-occurrence relationship between words as well as the long-span latent topic information, for spoken document retrieval (SDR). A given document as a whole is modeled as a composite WTM model for generating an observed query. The underlying characteristics and different kinds of model structures are extensively investigated, while the performance of WTM is thoroughly analyzed and verified by comparison with a few existing retrieval models on the TDT-2 SDR task. We also attempt to incorporate part-of-speech (POS) weighting into the representations of the query observations and the WTM models for obtaining better retrieval performance.

Original languageEnglish
Title of host publication2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Pages3961-3964
Number of pages4
DOIs
Publication statusPublished - 2009 Sep 23
Event2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan
Duration: 2009 Apr 192009 Apr 24

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
CountryTaiwan
CityTaipei
Period09/4/1909/4/24

Fingerprint

Model structures
Composite materials

Keywords

  • Language model
  • Probabilistic latent semantic analysis
  • Spoken document retrieval
  • Word topic model

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Chen, B. (2009). Latent topic modeling of word co-occurrence information for spoken document retrieval. In 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009 (pp. 3961-3964). [4960495] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2009.4960495

Latent topic modeling of word co-occurrence information for spoken document retrieval. / Chen, Berlin.

2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009. 2009. p. 3961-3964 4960495 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, B 2009, Latent topic modeling of word co-occurrence information for spoken document retrieval. in 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009., 4960495, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 3961-3964, 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009, Taipei, Taiwan, 09/4/19. https://doi.org/10.1109/ICASSP.2009.4960495
Chen B. Latent topic modeling of word co-occurrence information for spoken document retrieval. In 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009. 2009. p. 3961-3964. 4960495. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2009.4960495
Chen, Berlin. / Latent topic modeling of word co-occurrence information for spoken document retrieval. 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009. 2009. pp. 3961-3964 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{c470ebcec32f413096313373cf858ddd,
title = "Latent topic modeling of word co-occurrence information for spoken document retrieval",
abstract = "In this paper, we present a word topic model (WTM) approach, discovering the co-occurrence relationship between words as well as the long-span latent topic information, for spoken document retrieval (SDR). A given document as a whole is modeled as a composite WTM model for generating an observed query. The underlying characteristics and different kinds of model structures are extensively investigated, while the performance of WTM is thoroughly analyzed and verified by comparison with a few existing retrieval models on the TDT-2 SDR task. We also attempt to incorporate part-of-speech (POS) weighting into the representations of the query observations and the WTM models for obtaining better retrieval performance.",
keywords = "Language model, Probabilistic latent semantic analysis, Spoken document retrieval, Word topic model",
author = "Berlin Chen",
year = "2009",
month = "9",
day = "23",
doi = "10.1109/ICASSP.2009.4960495",
language = "English",
isbn = "9781424423545",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
pages = "3961--3964",
booktitle = "2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009",

}

TY - GEN

T1 - Latent topic modeling of word co-occurrence information for spoken document retrieval

AU - Chen, Berlin

PY - 2009/9/23

Y1 - 2009/9/23

N2 - In this paper, we present a word topic model (WTM) approach, discovering the co-occurrence relationship between words as well as the long-span latent topic information, for spoken document retrieval (SDR). A given document as a whole is modeled as a composite WTM model for generating an observed query. The underlying characteristics and different kinds of model structures are extensively investigated, while the performance of WTM is thoroughly analyzed and verified by comparison with a few existing retrieval models on the TDT-2 SDR task. We also attempt to incorporate part-of-speech (POS) weighting into the representations of the query observations and the WTM models for obtaining better retrieval performance.

AB - In this paper, we present a word topic model (WTM) approach, discovering the co-occurrence relationship between words as well as the long-span latent topic information, for spoken document retrieval (SDR). A given document as a whole is modeled as a composite WTM model for generating an observed query. The underlying characteristics and different kinds of model structures are extensively investigated, while the performance of WTM is thoroughly analyzed and verified by comparison with a few existing retrieval models on the TDT-2 SDR task. We also attempt to incorporate part-of-speech (POS) weighting into the representations of the query observations and the WTM models for obtaining better retrieval performance.

KW - Language model

KW - Probabilistic latent semantic analysis

KW - Spoken document retrieval

KW - Word topic model

UR - http://www.scopus.com/inward/record.url?scp=70349223893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349223893&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2009.4960495

DO - 10.1109/ICASSP.2009.4960495

M3 - Conference contribution

AN - SCOPUS:70349223893

SN - 9781424423545

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 3961

EP - 3964

BT - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009

ER -