Spoken Document Retrieval Leveraging Bert-Based Modeling and Query Reformulation

Shao Wei Fan-Jiang, Tien Hong Lo, Berlin Chen

研究成果: 書貢獻/報告類型會議貢獻

摘要

Spoken document retrieval (SDR) has long been deemed a fundamental and important step towards efficient organization of, and access to multimedia associated with spoken content. In this paper, we present a novel study of SDR leveraging the Bidirectional Encoder Representations from Transformers (BERT) model for query and document representations (embeddings), as well as for relevance scoring. BERT has produced extremely promising results for various tasks in natural language understanding, but relatively little research on it is devoted to text information retrieval (IR), let alone SDR. We further tackle one of the critical problems facing SDR, viz. a query is often too short to convey a user's information need, via the process of pseudo-relevance feedback (PRF), showing how information cues induced from PRF can be aptly incorporated into BERT for query expansion. In addition, such query reformulation through PRF also works in conjunction with additional augmentation of lexical features and confidence scores into the document embeddings learned from BERT. The merits of our approach are attested through extensive sets of experiments, which compare it with several classic and cutting-edge (deep learning-based) retrieval approaches.

原文英語
主出版物標題2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
發行者Institute of Electrical and Electronics Engineers Inc.
頁面8144-8148
頁數5
ISBN(電子)9781509066315
DOIs
出版狀態已發佈 - 2020 五月
對外發佈Yes
事件2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, 西班牙
持續時間: 2020 五月 42020 五月 8

出版系列

名字ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(列印)1520-6149

會議

會議2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
國家西班牙
城市Barcelona
期間20/5/420/5/8

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

指紋 深入研究「Spoken Document Retrieval Leveraging Bert-Based Modeling and Query Reformulation」主題。共同形成了獨特的指紋。

引用此