Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality

Authors

  • Hiromasa Arai The University of Texas at Dallas
  • Crystal Maung The University of Texas at Dallas
  • Ke Xu The University of Texas at Dallas
  • Haim Schweitzer The University of Texas at Dallas

DOI:

https://doi.org/10.1609/aaai.v30i1.10082

Abstract

Identifying a small number of features that can represent the data is a known problem that comes up in areas such as machine learning, knowledge representation, data mining, and numerical linear algebra. Computing an optimal solution is believed to be NP-hard, and there is extensive work on approximation algorithms. Classic approaches exploit the algebraic structure of the underlying matrix, while more recent approaches use randomization. An entirely different approach that uses the A* heuristic search algorithm to find an optimal solution was recently proposed. Not surprisingly it is limited to effectively selecting only a small number of features. We propose a similar approach related to the Weighted A* algorithm. This gives algorithms that are not guaranteed to find an optimal solution but run much faster than the A* approach, enabling effective selection of many features from large datasets. We demonstrate experimentally that these new algorithms are more accurate than the current state-of-the-art while still being practical. Furthermore, they come with an adjustable guarantee on how different their error may be from the smallest possible (optimal) error. Their accuracy can always be increased at the expense of a longer running time.

Downloads

Published

2016-02-21

How to Cite

Arai, H., Maung, C., Xu, K., & Schweitzer, H. (2016). Unsupervised Feature Selection by Heuristic Search with Provable Bounds on Suboptimality. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10082

Issue

Section

Technical Papers: Heuristic Search and Optimization