TriSampler: A Better Negative Sampling Principle for Dense Retrieval

Authors

  • Zhen Yang Tsinghua University
  • Zhou Shao Tsinghua University
  • Yuxiao Dong Tsinghua University
  • Jie Tang Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i8.28779

Keywords:

DMKM: Conversational Systems for Recommendation & Retrieval

Abstract

Negative sampling stands as a pivotal technique in dense retrieval, essential for training effective retrieval models and significantly impacting retrieval performance. While existing negative sampling methods have made commendable progress by leveraging hard negatives, a comprehensive guiding principle for constructing negative candidates and designing negative sampling distributions is still lacking. To bridge this gap, we embark on a theoretical analysis of negative sampling in dense retrieval. This exploration culminates in the unveiling of the quasi-triangular principle, a novel framework that elucidates the triangular-like interplay between query, positive document, and negative document. Fueled by this guiding principle, we introduce TriSampler, a straightforward yet highly effective negative sampling method. The keypoint of TriSampler lies in its ability to selectively sample more informative negatives within a prescribed constrained region. Experimental evaluation show that TriSampler consistently attains superior retrieval performance across a diverse of representative retrieval models.

Published

2024-03-24

How to Cite

Yang, Z., Shao, Z., Dong, Y., & Tang, J. (2024). TriSampler: A Better Negative Sampling Principle for Dense Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 9269-9277. https://doi.org/10.1609/aaai.v38i8.28779

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management