Topic modeling for spoken document retrieval using word- and syllable-level information

Shih Hsiang Lin, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.

Original languageEnglish
Title of host publication3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09
Pages3-10
Number of pages8
DOIs
Publication statusPublished - 2009
Event3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09 - Beijing, China
Duration: 2009 Oct 192009 Oct 24

Other

Other3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09
CountryChina
CityBeijing
Period09/10/1909/10/24

Fingerprint

Information retrieval
Degradation
Experiments

Keywords

  • Document topic models
  • Information retrieval
  • Speech recognition
  • Spoken document retrieval
  • Word topic models

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Lin, S. H., & Chen, B. (2009). Topic modeling for spoken document retrieval using word- and syllable-level information. In 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09 (pp. 3-10) https://doi.org/10.1145/1631127.1631129

Topic modeling for spoken document retrieval using word- and syllable-level information. / Lin, Shih Hsiang; Chen, Berlin.

3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09. 2009. p. 3-10.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, SH & Chen, B 2009, Topic modeling for spoken document retrieval using word- and syllable-level information. in 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09. pp. 3-10, 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09, Beijing, China, 09/10/19. https://doi.org/10.1145/1631127.1631129
Lin SH, Chen B. Topic modeling for spoken document retrieval using word- and syllable-level information. In 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09. 2009. p. 3-10 https://doi.org/10.1145/1631127.1631129
Lin, Shih Hsiang ; Chen, Berlin. / Topic modeling for spoken document retrieval using word- and syllable-level information. 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09. 2009. pp. 3-10
@inproceedings{0c6c512062ab4a8a89324ef949f8459a,
title = "Topic modeling for spoken document retrieval using word- and syllable-level information",
abstract = "Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.",
keywords = "Document topic models, Information retrieval, Speech recognition, Spoken document retrieval, Word topic models",
author = "Lin, {Shih Hsiang} and Berlin Chen",
year = "2009",
doi = "10.1145/1631127.1631129",
language = "English",
isbn = "9781605587622",
pages = "3--10",
booktitle = "3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09",

}

TY - GEN

T1 - Topic modeling for spoken document retrieval using word- and syllable-level information

AU - Lin, Shih Hsiang

AU - Chen, Berlin

PY - 2009

Y1 - 2009

N2 - Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.

AB - Topic modeling for information retrieval (IR) has attracted significant attention and demonstrated good performance in a wide variety of tasks over the years. In this article, we first present a comprehensive comparison among various topic modeling approaches, including the so-called document topic models (DTM) and word topic models (WTM), for Chinese spoken document retrieval (SDR). Moreover, in order to lessen SDR performance degradation when using imperfect recognition transcripts, we also leverage different levels of indexing features for topic modeling, including words, syllable-level units and their combinations. All the experiments are performed on the TDT Chinese collection.

KW - Document topic models

KW - Information retrieval

KW - Speech recognition

KW - Spoken document retrieval

KW - Word topic models

UR - http://www.scopus.com/inward/record.url?scp=72249090206&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72249090206&partnerID=8YFLogxK

U2 - 10.1145/1631127.1631129

DO - 10.1145/1631127.1631129

M3 - Conference contribution

AN - SCOPUS:72249090206

SN - 9781605587622

SP - 3

EP - 10

BT - 3rd Workshop on Searching Spontaneous Conversational Speech, SSCS'09, Co-located with the 2009 ACM International Conference on Multimedia, MM'09

ER -