Improving the Accuracy of Dense Retrieval on the Quantized Indexes via Gradient Optimization of the Target Embeddings

Cong Tan; Yongqi Shao; Hong Huo; Tao Fang

doi:10.1609/aaai.v40i39.40601

Authors

Cong Tan Shanghai Jiaotong University
Yongqi Shao Shanghai Jiaotong University
Hong Huo Shanghai Jiaotong University
Tao Fang Shanghai Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v40i39.40601

Abstract

Dense retrieval models commonly use flat indexes to achieve high-precision retrieval by computing exact distances between embedding vectors. However, flat indexes are memory-intensive and inefficient, limiting their scalability in large-scale retrieval tasks. In contrast, quantized indexes enable faster retrieval with significantly lower memory usage, but their accuracy tends to decrease. Therefore, we propose a scalable and efficient training method for the dual-encoder models to improves the retrieval accuracy on quantized indexes. Our approach combines the direct gradient update to the cached target embeddings with large scale negative sampling based on similarity, significantly reducing computational overhead and GPU memory usage. Target embeddings are initialized with a pre-trained encoder and stored in a memory buffer, which is directly updated via backpropagation, thus avoiding the repeated re-encoding of the full corpus. To build a rich set of negatives, we retrieve the top-k most similar targets for each query from cached embeddings using the quantized index, including both query-specific and cross-batch top-k results. This design effectively approximates the truncated softmax distribution. The experiments show that our method achieves performs exceptionally well on quantized indexes, providing a practical and scalable solution for real-world retrieval systems.

Improving the Accuracy of Dense Retrieval on the Quantized Indexes via Gradient Optimization of the Target Embeddings

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information