Mitigating the Impact of False Negative in Dense Retrieval with Contrastive Confidence Regularization

Authors

  • Shiqi Wang National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University
  • Yeqin Zhang National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University
  • Cam-Tu Nguyen National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University

DOI:

https://doi.org/10.1609/aaai.v38i17.29885

Keywords:

NLP: Question Answering, DMKM: Conversational Systems for Recommendation & Retrieval, ML: Unsupervised & Self-Supervised Learning, General, NLP: Safety and Robustness, NLP: Learning & Optimization for NLP, NLP: (Large) Language Models

Abstract

In open-domain Question Answering (QA), dense text retrieval is crucial for finding relevant passages to generate answers. Typically, contrastive learning is used to train a retrieval model, which maps passages and queries to the same semantic space, making similar ones closer and dissimilar ones further apart. However, training such a system is challenging due to the false negative problem, where relevant passages may be missed during data annotation. Hard negative sampling, commonly used to improve contrastive learning, can introduce more noise in training. This is because hard negatives are those close to a given query, and thus more likely to be false negatives. To address this, we propose a novel contrastive confidence regularizer for Noise Contrastive Estimation (NCE) loss, a commonly used contrastive loss. Our analysis shows that the regularizer helps make the dense retrieval model more robust against false negatives with a theoretical guarantee. Additionally, we propose a model-agnostic method to filter out noisy negative passages in the dataset, improving any downstream dense retrieval models. Through experiments on three datasets, we demonstrate that our method achieves better retrieval performance in comparison to existing state-of-the-art dense retrieval systems.

Published

2024-03-24

How to Cite

Wang, S., Zhang, Y., & Nguyen, C.-T. (2024). Mitigating the Impact of False Negative in Dense Retrieval with Contrastive Confidence Regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19171-19179. https://doi.org/10.1609/aaai.v38i17.29885

Issue

Section

AAAI Technical Track on Natural Language Processing II