Leveraging relevance cues for improved spoken document retrieval

Pei Ning Chen, Kuan Yu Chen, Berlin Chen

Research output: Contribution to journalConference article

7 Citations (Scopus)

Abstract

Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.

Original languageEnglish
Pages (from-to)929-932
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication statusPublished - 2011 Dec 1
Event12th Annual Conference of the International Speech Communication Association, INTERSPEECH 2011 - Florence, Italy
Duration: 2011 Aug 272011 Aug 31

Fingerprint

Speech processing
Document Retrieval
Speech recognition
Query
Retrieval
Experiments
Speech Processing
Language Modeling
Threefolds
Speech Recognition
Granularity
Imperfect
Relevance
Experiment
Framework

Keywords

  • Kullback-Leibler divergence
  • Language modeling
  • Relevance model
  • Spoken document retrieval
  • Topic model

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

Leveraging relevance cues for improved spoken document retrieval. / Chen, Pei Ning; Chen, Kuan Yu; Chen, Berlin.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 01.12.2011, p. 929-932.

Research output: Contribution to journalConference article

@article{2731518d06aa4a3ea8812d30ad04c8d0,
title = "Leveraging relevance cues for improved spoken document retrieval",
abstract = "Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.",
keywords = "Kullback-Leibler divergence, Language modeling, Relevance model, Spoken document retrieval, Topic model",
author = "Chen, {Pei Ning} and Chen, {Kuan Yu} and Berlin Chen",
year = "2011",
month = "12",
day = "1",
language = "English",
pages = "929--932",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Leveraging relevance cues for improved spoken document retrieval

AU - Chen, Pei Ning

AU - Chen, Kuan Yu

AU - Chen, Berlin

PY - 2011/12/1

Y1 - 2011/12/1

N2 - Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.

AB - Spoken document retrieval (SDR) has emerged as an active area of research in the speech processing community. The fundamental problems facing SDR are generally three-fold: 1) a query is often only a vague expression of an underlying information need, 2) there probably would be word usage mismatch between a query and a spoken document even if they are topically related to each other, and 3) the imperfect speech recognition transcript carries wrong information and thus deviates somewhat from representing the true theme of a spoken document. To mitigate the above problems, in this paper, we study a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a flexible but systematic way to render the lexical and topical relationships between a query and a spoken document. Moreover, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance cues. Experiments conducted on the TDT SDR task show promise of the methods deduced from our retrieval framework when compared with a few existing retrieval methods.

KW - Kullback-Leibler divergence

KW - Language modeling

KW - Relevance model

KW - Spoken document retrieval

KW - Topic model

UR - http://www.scopus.com/inward/record.url?scp=84865757647&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865757647&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84865757647

SP - 929

EP - 932

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -