Localized Centering: Reducing Hubness in Large-Sample Data

Authors

  • Kazuo Hara National Institute of Genetics
  • Ikumi Suzuki National Institute of Genetics
  • Masashi Shimbo Nara Institute of Science and Technology
  • Kei Kobayashi The Institute of Statistical Mathematics
  • Kenji Fukumizu The Institute of Statistical Mathematics
  • Miloš Radovanović University of Novi Sad

DOI:

https://doi.org/10.1609/aaai.v29i1.9629

Keywords:

Hubness, Centering, k nearest neighbor method

Abstract

Hubness has been recently identified as a problematic phenomenon occurring in high-dimensional space. In this paper, we address a different type of hubness that occurs when the number of samples is large. We investigate the difference between the hubness in high-dimensional data and the one in large-sample data. One finding is that centering, which is known to reduce the former, does not work for the latter. We then propose a new hub-reduction method, called localized centering. It is an extension of centering, yet works effectively for both types of hubness. Using real-world datasets consisting of a large number of documents, we demonstrate that the proposed method improves the accuracy of k-nearest neighbor classification.

Downloads

Published

2015-02-21

How to Cite

Hara, K., Suzuki, I., Shimbo, M., Kobayashi, K., Fukumizu, K., & Radovanović, M. (2015). Localized Centering: Reducing Hubness in Large-Sample Data. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9629

Issue

Section

Main Track: Novel Machine Learning Algorithms