Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship

Authors

  • Junfeng Kang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
  • Rui Li State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
  • Qi Liu State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Zhenya Huang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Zheng Zhang State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
  • Yanjiang Chen State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
  • Linbo Zhu Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Yu Su Institute of Artificial Intelligence, Hefei Comprehensive National Science Center School of Computer Science and Artificial Intelligence, Hefei Normal University

DOI:

https://doi.org/10.1609/aaai.v39i11.33299

Abstract

Dense retrieval has emerged as the leading approach in information retrieval, aiming to find semantically relevant documents based on natural language queries. Given that a single document can be retrieved by multiple distinct queries, existing methods aim to represent a document with multiple vectors. Each vector is aligned with a different query to model the many-to-one relationship between queries and documents. However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). Specifically, we use vectors to represent queries and distributions to represent documents. This approach not only captures the relationships between multiple queries corresponding to the same document but also avoids the need to use multiple vectors to represent the document. Furthermore, to ensure search efficiency for DDR, we propose a dot product-based computation method to calculate the similarity between documents represented by distributions and queries represented by vectors. This allows for seamless integration with existing approximate nearest neighbor (ANN) search algorithms for efficient search. Finally, we conduct extensive experiments on real-world datasets, which demonstrate that our method significantly outperforms traditional dense retrieval methods.

Published

2025-04-11

How to Cite

Kang, J., Li, R., Liu, Q., Huang, Z., Zhang, Z., Chen, Y., … Su, Y. (2025). Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship. Proceedings of the AAAI Conference on Artificial Intelligence, 39(11), 11933–11941. https://doi.org/10.1609/aaai.v39i11.33299

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management I