Topic modeling for spoken document retrieval using word- and syllable-level information

Shih Hsiang Lin, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.

Original languageEnglish
Title of host publication3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09
Pages3-10
Number of pages8
DOIs
Publication statusPublished - 2009 Dec 24
Event3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09 - Beijing, China
Duration: 2009 Oct 192009 Oct 24

Publication series

Name3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09

Other

Other3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09
CountryChina
CityBeijing
Period09/10/1909/10/24

Keywords

  • Document topic models
  • Information retrieval
  • Speech recognition
  • Spoken document retrieval
  • Word topic models

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Fingerprint Dive into the research topics of 'Topic modeling for spoken document retrieval using word- and syllable-level information'. Together they form a unique fingerprint.

Cite this