Accelerated Combinatorial Search for Outlier Detection with Provable Bound on Sub-Optimality

Authors

  • Guihong Wan The University of Texas at Dallas
  • Haim Schweitzer The University of Texas at Dallas

Keywords:

Heuristic Search, Dimensionality Reduction/Feature Selection, Anomaly/Outlier Detection, Unsupervised & Self-Supervised Learning

Abstract

Outliers negatively affect the accuracy of data analysis. In this paper we are concerned with their influence on the accuracy of Principal Component Analysis (PCA). Algorithms that attempt to detect outliers and remove them from the data prior to applying PCA are sometimes called Robust PCA, or Robust Subspace Recovery algorithms. We propose a new algorithm for outlier detection that combines two ideas. The first is "chunk recursive elimination" that was used effectively to accelerate feature selection, and the second is combinatorial search, in a setting similar to A*. Our main result is showing how to combine these two ideas. One variant of our algorithm is guaranteed to compute an optimal solution according to some natural criteria, but its running time makes it impractical for large datasets. Other variants are much faster and come with provable bounds on sub-optimality. Experimental results show the effectiveness of the proposed approach.

Downloads

Published

2021-05-18

How to Cite

Wan, G., & Schweitzer, H. (2021). Accelerated Combinatorial Search for Outlier Detection with Provable Bound on Sub-Optimality. Proceedings of the AAAI Conference on Artificial Intelligence, 35(14), 12436-12444. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17475

Issue

Section

AAAI Technical Track on Search and Optimization