Improving the Accuracy of Dense Retrieval on the Quantized Indexes via Gradient Optimization of the Target Embeddings

Authors

  • Cong Tan Shanghai Jiaotong University
  • Yongqi Shao Shanghai Jiaotong University
  • Hong Huo Shanghai Jiaotong University
  • Tao Fang Shanghai Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v40i39.40601

Abstract

Dense retrieval models commonly use flat indexes to achieve high-precision retrieval by computing exact distances between embedding vectors. However, flat indexes are memory-intensive and inefficient, limiting their scalability in large-scale retrieval tasks. In contrast, quantized indexes enable faster retrieval with significantly lower memory usage, but their accuracy tends to decrease. Therefore, we propose a scalable and efficient training method for the dual-encoder models to improves the retrieval accuracy on quantized indexes. Our approach combines the direct gradient update to the cached target embeddings with large scale negative sampling based on similarity, significantly reducing computational overhead and GPU memory usage. Target embeddings are initialized with a pre-trained encoder and stored in a memory buffer, which is directly updated via backpropagation, thus avoiding the repeated re-encoding of the full corpus. To build a rich set of negatives, we retrieve the top-k most similar targets for each query from cached embeddings using the quantized index, including both query-specific and cross-batch top-k results. This design effectively approximates the truncated softmax distribution. The experiments show that our method achieves performs exceptionally well on quantized indexes, providing a practical and scalable solution for real-world retrieval systems.

Published

2026-03-14

How to Cite

Tan, C., Shao, Y., Huo, H., & Fang, T. (2026). Improving the Accuracy of Dense Retrieval on the Quantized Indexes via Gradient Optimization of the Target Embeddings. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33171–33178. https://doi.org/10.1609/aaai.v40i39.40601

Issue

Section

AAAI Technical Track on Natural Language Processing IV