Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship

Junfeng Kang; Rui Li; Qi Liu; Zhenya Huang; Zheng Zhang; Yanjiang Chen; Linbo Zhu; Yu Su

doi:10.1609/aaai.v39i11.33299

Authors

Junfeng Kang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Rui Li State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Qi Liu State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Zhenya Huang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Zheng Zhang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Yanjiang Chen State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Linbo Zhu Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Yu Su Institute of Artificial Intelligence, Hefei Comprehensive National Science Center School of Computer Science and Artificial Intelligence, Hefei Normal University

DOI:

https://doi.org/10.1609/aaai.v39i11.33299

Abstract

Dense retrieval has emerged as the leading approach in information retrieval, aiming to find semantically relevant documents based on natural language queries. Given that a single document can be retrieved by multiple distinct queries, existing methods aim to represent a document with multiple vectors. Each vector is aligned with a different query to model the many-to-one relationship between queries and documents. However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). Specifically, we use vectors to represent queries and distributions to represent documents. This approach not only captures the relationships between multiple queries corresponding to the same document but also avoids the need to use multiple vectors to represent the document. Furthermore, to ensure search efficiency for DDR, we propose a dot product-based computation method to calculate the similarity between documents represented by distributions and queries represented by vectors. This allows for seamless integration with existing approximate nearest neighbor (ANN) search algorithms for efficient search. Finally, we conduct extensive experiments on real-world datasets, which demonstrate that our method significantly outperforms traditional dense retrieval methods.

Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information