Abstract
Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.
Original language | English |
---|---|
Pages (from-to) | 929-932 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2011 Dec 1 |
Event | 12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy Duration: 2011 Aug 27 → 2011 Aug 31 |
Fingerprint
Keywords
- Kullback-Leibler divergence
- Language modeling
- Relevance model
- Spoken document retrieval
- Topic model
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation
Cite this
Leveraging relevance cues for improved spoken document retrieval. / Chen, Pei Ning; Chen, Kuan Yu; Chen, Berlin.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 01.12.2011, p. 929-932.Research output: Contribution to journal › Conference article
}
TY - JOUR
T1 - Leveraging relevance cues for improved spoken document retrieval
AU - Chen, Pei Ning
AU - Chen, Kuan Yu
AU - Chen, Berlin
PY - 2011/12/1
Y1 - 2011/12/1
N2 - Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.
AB - Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.
KW - Kullback-Leibler divergence
KW - Language modeling
KW - Relevance model
KW - Spoken document retrieval
KW - Topic model
UR - http://www.scopus.com/inward/record.url?scp=84865757647&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84865757647&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84865757647
SP - 929
EP - 932
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
ER -