TY - JOUR
T1 - Exploring the use of significant words language modeling for spoken document retrieval
AU - Chen, Ying Wen
AU - Chen, Kuan Yu
AU - Wang, Hsin Min
AU - Chen, Berlin
N1 - Publisher Copyright:
Copyright © 2017 ISCA.
PY - 2017
Y1 - 2017
N2 - Owing to the rapid global access to tremendous amounts of multimedia associated with speech information on the Internet, spoken document retrieval (SDR) has become an emerging application recently. Apart from much effort devoted to developing robust indexing and modeling techniques for spoken documents, a recent line of research targets at enriching and reformulating query representations in an attempt to enhance retrieval effectiveness. In practice, pseudo-relevance feedback is by far the most prevalent paradigm for query reformulation, which assumes that top-ranked feedback documents obtained from the initial round of retrieval are potentially relevant and can be exploited to reformulate the original query. Continuing this line of research, the paper presents a novel modeling framework, which aims at discovering significant words occurring in the feedback documents, to infer an enhanced query language model for SDR. Formally, the proposed framework targets at extracting the essential words representing a common notion of relevance (i.e., the significant words which occur in almost all of the feedback documents), so as to deduce a new query language model that captures these significant words and meanwhile modulates the influence of both highly frequent words and too specific words. Experiments conducted on a benchmark SDR task demonstrate the performance merits of our proposed framework.
AB - Owing to the rapid global access to tremendous amounts of multimedia associated with speech information on the Internet, spoken document retrieval (SDR) has become an emerging application recently. Apart from much effort devoted to developing robust indexing and modeling techniques for spoken documents, a recent line of research targets at enriching and reformulating query representations in an attempt to enhance retrieval effectiveness. In practice, pseudo-relevance feedback is by far the most prevalent paradigm for query reformulation, which assumes that top-ranked feedback documents obtained from the initial round of retrieval are potentially relevant and can be exploited to reformulate the original query. Continuing this line of research, the paper presents a novel modeling framework, which aims at discovering significant words occurring in the feedback documents, to infer an enhanced query language model for SDR. Formally, the proposed framework targets at extracting the essential words representing a common notion of relevance (i.e., the significant words which occur in almost all of the feedback documents), so as to deduce a new query language model that captures these significant words and meanwhile modulates the influence of both highly frequent words and too specific words. Experiments conducted on a benchmark SDR task demonstrate the performance merits of our proposed framework.
KW - Pseudo relevance feedback
KW - Query model
KW - Significant words
UR - http://www.scopus.com/inward/record.url?scp=85039157413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85039157413&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2017-612
DO - 10.21437/Interspeech.2017-612
M3 - Conference article
AN - SCOPUS:85039157413
SN - 2308-457X
VL - 2017-August
SP - 2889
EP - 2893
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017
Y2 - 20 August 2017 through 24 August 2017
ER -