Topic modeling for spoken document retrieval using word- and syllable-level information

Shih Hsiang Lin, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.

Original languageEnglish
Title of host publication3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09
Pages3-10
Number of pages8
DOIs
Publication statusPublished - 2009
Event3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09 - Beijing, China
Duration: 2009 Oct 192009 Oct 24

Other

Other3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09
CountryChina
CityBeijing
Period09/10/1909/10/24

    Fingerprint

Keywords

  • Document topic models
  • Information retrieval
  • Speech recognition
  • Spoken document retrieval
  • Word topic models

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Lin, S. H., & Chen, B. (2009). Topic modeling for spoken document retrieval using word- and syllable-level information. In 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09 (pp. 3-10) https://doi.org/10.1145/1631127.1631129