Random Projection with Filtering for Nearly Duplicate Search

Authors

  • Yue Lin Zhejiang University
  • Rong Jin Michigan State University
  • Deng Cai Zhejiang University
  • Xiaofei He Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v26i1.8199

Keywords:

Random Projection, Hashing, Compressed Sensing, Sparse Coding

Abstract

High dimensional nearest neighbor search is a fundamental problem and has found applications in many domains. Although many hashing based approaches have been proposed for approximate nearest neighbor search in high dimensional space, one main drawback is that they often return many false positives that need to be filtered out by a post procedure. We propose a novel method to address this limitation in this paper. The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers. We first obtain a sparse representation for each data point by the landmark based approach, after which we solve the nearly duplicate search that the difference between the query and its nearest neighbors forms a sparse vector living in a small ℓp ball, where p ≤ 1. Our empirical study on real-world datasets demonstrates the effectiveness of the proposed approach compared to the
state-of-the-art hashing methods.

Downloads

Published

2021-09-20

How to Cite

Lin, Y., Jin, R., Cai, D., & He, X. (2021). Random Projection with Filtering for Nearly Duplicate Search. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 641-647. https://doi.org/10.1609/aaai.v26i1.8199