Enhancing query formulation for spoken document retrieval

Berlin Chen, Yi Wen Chen, Kuan Yu Chen, Hsin Min Wang, Kuen Tyng Yu

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The popularity and ubiquity of multimedia associated with spoken documents has spurred a lot of research interest in spoken document retrieval (SDR) in the recent past. Beyond much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the user's information need. Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation, which assumes that a small amount of top-ranked feedback documents obtained from the initial round of retrieval are relevant and can be utilized for this purpose. Nevertheless, simply taking all of the top-ranked feedback documents obtained from the initial retrieval for query modeling does not always perform well, especially when the top-ranked documents contain much redundant or non-relevant information. In the view of this, we explore in this paper an interesting problem of how to effectively glean useful cues from the top-ranked documents so as to achieve more accurate query modeling. Towards this end, various sources of information cues are considered and integrated into the process of feedback document selection so as to achieve better retrieval effectiveness. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the query and document models. A series of experiments conducted on the TDT (Topic Detection and Tracking) task seem to demonstrate the effectiveness of our query modeling framework for SDR.

Original languageEnglish
Pages (from-to)553-569
Number of pages17
JournalJournal of Information Science and Engineering
Volume30
Issue number3
Publication statusPublished - 2014 Jan 1

Fingerprint

Feedback
research interest
Experiments
indexing
source of information
popularity
multimedia
paradigm
experiment

Keywords

  • Language modeling
  • Pseudo-relevance feedback
  • Query modeling
  • Speech recognition
  • Spoken document retrieval

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Hardware and Architecture
  • Library and Information Sciences
  • Computational Theory and Mathematics

Cite this

Chen, B., Chen, Y. W., Chen, K. Y., Wang, H. M., & Yu, K. T. (2014). Enhancing query formulation for spoken document retrieval. Journal of Information Science and Engineering, 30(3), 553-569.

Enhancing query formulation for spoken document retrieval. / Chen, Berlin; Chen, Yi Wen; Chen, Kuan Yu; Wang, Hsin Min; Yu, Kuen Tyng.

In: Journal of Information Science and Engineering, Vol. 30, No. 3, 01.01.2014, p. 553-569.

Research output: Contribution to journalArticle

Chen, B, Chen, YW, Chen, KY, Wang, HM & Yu, KT 2014, 'Enhancing query formulation for spoken document retrieval', Journal of Information Science and Engineering, vol. 30, no. 3, pp. 553-569.
Chen, Berlin ; Chen, Yi Wen ; Chen, Kuan Yu ; Wang, Hsin Min ; Yu, Kuen Tyng. / Enhancing query formulation for spoken document retrieval. In: Journal of Information Science and Engineering. 2014 ; Vol. 30, No. 3. pp. 553-569.
@article{092ffb69098748a485b3ac7825104bcc,
title = "Enhancing query formulation for spoken document retrieval",
abstract = "The popularity and ubiquity of multimedia associated with spoken documents has spurred a lot of research interest in spoken document retrieval (SDR) in the recent past. Beyond much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the user's information need. Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation, which assumes that a small amount of top-ranked feedback documents obtained from the initial round of retrieval are relevant and can be utilized for this purpose. Nevertheless, simply taking all of the top-ranked feedback documents obtained from the initial retrieval for query modeling does not always perform well, especially when the top-ranked documents contain much redundant or non-relevant information. In the view of this, we explore in this paper an interesting problem of how to effectively glean useful cues from the top-ranked documents so as to achieve more accurate query modeling. Towards this end, various sources of information cues are considered and integrated into the process of feedback document selection so as to achieve better retrieval effectiveness. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the query and document models. A series of experiments conducted on the TDT (Topic Detection and Tracking) task seem to demonstrate the effectiveness of our query modeling framework for SDR.",
keywords = "Language modeling, Pseudo-relevance feedback, Query modeling, Speech recognition, Spoken document retrieval",
author = "Berlin Chen and Chen, {Yi Wen} and Chen, {Kuan Yu} and Wang, {Hsin Min} and Yu, {Kuen Tyng}",
year = "2014",
month = "1",
day = "1",
language = "English",
volume = "30",
pages = "553--569",
journal = "Journal of Information Science and Engineering",
issn = "1016-2364",
publisher = "Institute of Information Science",
number = "3",

}

TY - JOUR

T1 - Enhancing query formulation for spoken document retrieval

AU - Chen, Berlin

AU - Chen, Yi Wen

AU - Chen, Kuan Yu

AU - Wang, Hsin Min

AU - Yu, Kuen Tyng

PY - 2014/1/1

Y1 - 2014/1/1

N2 - The popularity and ubiquity of multimedia associated with spoken documents has spurred a lot of research interest in spoken document retrieval (SDR) in the recent past. Beyond much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the user's information need. Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation, which assumes that a small amount of top-ranked feedback documents obtained from the initial round of retrieval are relevant and can be utilized for this purpose. Nevertheless, simply taking all of the top-ranked feedback documents obtained from the initial retrieval for query modeling does not always perform well, especially when the top-ranked documents contain much redundant or non-relevant information. In the view of this, we explore in this paper an interesting problem of how to effectively glean useful cues from the top-ranked documents so as to achieve more accurate query modeling. Towards this end, various sources of information cues are considered and integrated into the process of feedback document selection so as to achieve better retrieval effectiveness. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the query and document models. A series of experiments conducted on the TDT (Topic Detection and Tracking) task seem to demonstrate the effectiveness of our query modeling framework for SDR.

AB - The popularity and ubiquity of multimedia associated with spoken documents has spurred a lot of research interest in spoken document retrieval (SDR) in the recent past. Beyond much effort devoted to developing robust indexing and modeling techniques for representing spoken documents, a recent line of thought targets at the improvement of query modeling for better reflecting the user's information need. Pseudo-relevance feedback is by far the most commonly-used paradigm for query reformulation, which assumes that a small amount of top-ranked feedback documents obtained from the initial round of retrieval are relevant and can be utilized for this purpose. Nevertheless, simply taking all of the top-ranked feedback documents obtained from the initial retrieval for query modeling does not always perform well, especially when the top-ranked documents contain much redundant or non-relevant information. In the view of this, we explore in this paper an interesting problem of how to effectively glean useful cues from the top-ranked documents so as to achieve more accurate query modeling. Towards this end, various sources of information cues are considered and integrated into the process of feedback document selection so as to achieve better retrieval effectiveness. Furthermore, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the query and document models. A series of experiments conducted on the TDT (Topic Detection and Tracking) task seem to demonstrate the effectiveness of our query modeling framework for SDR.

KW - Language modeling

KW - Pseudo-relevance feedback

KW - Query modeling

KW - Speech recognition

KW - Spoken document retrieval

UR - http://www.scopus.com/inward/record.url?scp=84902306629&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902306629&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84902306629

VL - 30

SP - 553

EP - 569

JO - Journal of Information Science and Engineering

JF - Journal of Information Science and Engineering

SN - 1016-2364

IS - 3

ER -