S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions

Authors

  • Xian-Ling Mao Beijing Institute of Technology
  • Bo-Si Feng Beijing Institute of Technology
  • Yi-Jing Hao Beijing Institute of Technology
  • Liqiang Nie National University of Singapore
  • Heyan Huang Beijing Institute of Technology
  • Guihua Wen South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v31i1.10989

Keywords:

LSH, Probability distribution, Hashing

Abstract

To compare the similarity of probability distributions, the information-theoretically motivated metrics like Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metrics for probability distributions. In this paper, we first introduce a new approximation formula for S2JSD-distance, and then propose a novel LSH scheme adapted to S2JSD-distance for approximate nearest neighbors search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.

Downloads

Published

2017-02-12

How to Cite

Mao, X.-L., Feng, B.-S., Hao, Y.-J., Nie, L., Huang, H., & Wen, G. (2017). S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10989