A study of topic modeling techniques for spoken document retrieval

Kuan Yu Chen, Berlin Chen

Research output: Contribution to conferencePaper

1 Citation (Scopus)

Abstract

This paper focuses on comparison of two common categories of topic modeling techniques for spoken document retrieval (SDR), namely document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, assuming that user query logs along with click-through information of relevant documents can be utilized when building an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we also leverage different levels of index features for topic modeling, including words, syllable-level units, and their combination. Experiments conducted on the TDT-2 SDR task show that the methods deduced from our proposed modeling framework are very promising when compared with a few existing retrieval approaches.

Original languageEnglish
Pages237-242
Number of pages6
Publication statusPublished - 2010 Dec 1
Event2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010 - Biopolis, Singapore
Duration: 2010 Dec 142010 Dec 17

Other

Other2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010
CountrySingapore
CityBiopolis
Period10/12/1410/12/17

Fingerprint

Information retrieval systems
Speech recognition
Degradation
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems

Cite this

Chen, K. Y., & Chen, B. (2010). A study of topic modeling techniques for spoken document retrieval. 237-242. Paper presented at 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010, Biopolis, Singapore.

A study of topic modeling techniques for spoken document retrieval. / Chen, Kuan Yu; Chen, Berlin.

2010. 237-242 Paper presented at 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010, Biopolis, Singapore.

Research output: Contribution to conferencePaper

Chen, KY & Chen, B 2010, 'A study of topic modeling techniques for spoken document retrieval' Paper presented at 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010, Biopolis, Singapore, 10/12/14 - 10/12/17, pp. 237-242.
Chen KY, Chen B. A study of topic modeling techniques for spoken document retrieval. 2010. Paper presented at 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010, Biopolis, Singapore.
Chen, Kuan Yu ; Chen, Berlin. / A study of topic modeling techniques for spoken document retrieval. Paper presented at 2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010, Biopolis, Singapore.6 p.
@conference{d42fff93ab834e8497c6098ec4d52a71,
title = "A study of topic modeling techniques for spoken document retrieval",
abstract = "This paper focuses on comparison of two common categories of topic modeling techniques for spoken document retrieval (SDR), namely document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, assuming that user query logs along with click-through information of relevant documents can be utilized when building an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we also leverage different levels of index features for topic modeling, including words, syllable-level units, and their combination. Experiments conducted on the TDT-2 SDR task show that the methods deduced from our proposed modeling framework are very promising when compared with a few existing retrieval approaches.",
author = "Chen, {Kuan Yu} and Berlin Chen",
year = "2010",
month = "12",
day = "1",
language = "English",
pages = "237--242",
note = "2nd Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2010 ; Conference date: 14-12-2010 Through 17-12-2010",

}

TY - CONF

T1 - A study of topic modeling techniques for spoken document retrieval

AU - Chen, Kuan Yu

AU - Chen, Berlin

PY - 2010/12/1

Y1 - 2010/12/1

N2 - This paper focuses on comparison of two common categories of topic modeling techniques for spoken document retrieval (SDR), namely document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, assuming that user query logs along with click-through information of relevant documents can be utilized when building an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we also leverage different levels of index features for topic modeling, including words, syllable-level units, and their combination. Experiments conducted on the TDT-2 SDR task show that the methods deduced from our proposed modeling framework are very promising when compared with a few existing retrieval approaches.

AB - This paper focuses on comparison of two common categories of topic modeling techniques for spoken document retrieval (SDR), namely document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, assuming that user query logs along with click-through information of relevant documents can be utilized when building an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we also leverage different levels of index features for topic modeling, including words, syllable-level units, and their combination. Experiments conducted on the TDT-2 SDR task show that the methods deduced from our proposed modeling framework are very promising when compared with a few existing retrieval approaches.

UR - http://www.scopus.com/inward/record.url?scp=79958134834&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79958134834&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:79958134834

SP - 237

EP - 242

ER -