TY - GEN
T1 - 利用監督式對比學習來建構增強型的自迴歸文件檢索器
AU - Wang, Yi Cheng
AU - Yang, Tzu Ting
AU - Wang, Hsin Wei
AU - Hsu, Yung Chang
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2022 the Association for Computational Linguistics and Chinese Language Processing (ACLCLP).
PY - 2022
Y1 - 2022
N2 - The goal of an information retrieval system is to retrieve documents that are most relevant to a given user query from a huge collection of documents, which usually requires time-consuming multiple comparisons between the query and candidate documents so as to find the most relevant ones. Recently, a novel retrieval modeling approach, dubbed Differentiable Search Index (DSI), has been proposed. DSI dramatically simplifies the whole retrieval process by encoding all information about the document collection into the parameter space of a single Transformer model, on top of which DSI can in turn generate the relevant document identities (IDs) in an autoregressive manner in response to a user query. Although DSI addresses the shortcomings of traditional retrieval systems, previous studies have pointed out that DSI might fail to retrieve relevant documents because DSI uses the document IDs as the pivotal mechanism to establish the relationship between queries and documents, whereas not every document in the document collection has its corresponding relevant and irrelevant queries for the training purpose. In view of this, we put forward to leveraging supervised contrastive learning to better render the relationship between queries and documents in the latent semantic space. Furthermore, an approximate nearest neighbor search strategy is employed at retrieval time to further assist the Transformer model in generating document IDs relevant to a posed query more efficiently. A series of experiments conducted on the Nature Question benchmark dataset confirm the effectiveness and practical feasibility of our approach in relation to some strong baseline systems.
AB - The goal of an information retrieval system is to retrieve documents that are most relevant to a given user query from a huge collection of documents, which usually requires time-consuming multiple comparisons between the query and candidate documents so as to find the most relevant ones. Recently, a novel retrieval modeling approach, dubbed Differentiable Search Index (DSI), has been proposed. DSI dramatically simplifies the whole retrieval process by encoding all information about the document collection into the parameter space of a single Transformer model, on top of which DSI can in turn generate the relevant document identities (IDs) in an autoregressive manner in response to a user query. Although DSI addresses the shortcomings of traditional retrieval systems, previous studies have pointed out that DSI might fail to retrieve relevant documents because DSI uses the document IDs as the pivotal mechanism to establish the relationship between queries and documents, whereas not every document in the document collection has its corresponding relevant and irrelevant queries for the training purpose. In view of this, we put forward to leveraging supervised contrastive learning to better render the relationship between queries and documents in the latent semantic space. Furthermore, an approximate nearest neighbor search strategy is employed at retrieval time to further assist the Transformer model in generating document IDs relevant to a posed query more efficiently. A series of experiments conducted on the Nature Question benchmark dataset confirm the effectiveness and practical feasibility of our approach in relation to some strong baseline systems.
KW - Autoregressive Retrieval System
KW - Contrastive Learning
KW - Information Retrieval
UR - http://www.scopus.com/inward/record.url?scp=85154565691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85154565691&partnerID=8YFLogxK
M3 - 會議論文篇章
AN - SCOPUS:85154565691
T3 - ROCLING 2022 - Proceedings of the 34th Conference on Computational Linguistics and Speech Processing
SP - 273
EP - 282
BT - ROCLING 2022 - Proceedings of the 34th Conference on Computational Linguistics and Speech Processing
A2 - Chang, Yung-Chun
A2 - Huang, Yi-Chin
A2 - Wu, Jheng-Long
A2 - Su, Ming-Hsiang
A2 - Huang, Hen-Hsen
A2 - Liu, Yi-Fen
A2 - Lee, Lung-Hao
A2 - Chou, Chin-Hung
A2 - Liao, Yuan-Fu
PB - The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
T2 - 34th Conference on Computational Linguistics and Speech Processing, ROCLING 2022
Y2 - 21 November 2022 through 22 November 2022
ER -