Spoken document retrieval leveraging unsupervised and supervised topic modeling techniques

Kuan Yu Chen, Hsin Min Wang, Berlin Chen*

*此作品的通信作者

研究成果: 雜誌貢獻期刊論文同行評審

5 引文 斯高帕斯(Scopus)

摘要

This paper describes the application of two attractive categories of topic modeling techniques to the problem of spoken document retrieval (SDR), viz. document topic model (DTM) and word topic model (WTM). Apart from using the conventional unsupervised training strategy, we explore a supervised training strategy for estimating these topic models, imagining a scenario that user query logs along with click-through information of relevant documents can be utilized to build an SDR system. This attempt has the potential to associate relevant documents with queries even if they do not share any of the query words, thereby improving on retrieval quality over the baseline system. Likewise, we also study a novel use of pseudo-supervised training to associate relevant documents with queries through a pseudo-feedback procedure. Moreover, in order to lessen SDR performance degradation caused by imperfect speech recognition, we investigate leveraging different levels of index features for topic modeling, including words, syllable-level units, and their combination. We provide a series of experiments conducted on the TDT (TDT-2 and TDT-3) Chinese SDR collections. The empirical results show that the methods deduced from our proposed modeling framework are very effective when compared with a few existing retrieval approaches.

原文英語
頁(從 - 到)1195-1205
頁數11
期刊IEICE Transactions on Information and Systems
E95-D
發行號5
DOIs
出版狀態已發佈 - 2012 5月

ASJC Scopus subject areas

  • 軟體
  • 硬體和架構
  • 電腦視覺和模式識別
  • 電氣與電子工程
  • 人工智慧

指紋

深入研究「Spoken document retrieval leveraging unsupervised and supervised topic modeling techniques」主題。共同形成了獨特的指紋。

引用此