Optimal Sparse Regression Trees

Authors

  • Rui Zhang Duke University
  • Rui Xin Duke University
  • Margo Seltzer University of British Columbia
  • Cynthia Rudin Duke University

DOI:

https://doi.org/10.1609/aaai.v37i9.26334

Keywords:

ML: Transparent, Interpretable, Explainable ML, ML: Classification and Regression, ML: Optimization, ML: Clustering

Abstract

Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm on one dimensional data. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.

Downloads

Published

2023-06-26

How to Cite

Zhang, R., Xin, R., Seltzer, M., & Rudin, C. (2023). Optimal Sparse Regression Trees. Proceedings of the AAAI Conference on Artificial Intelligence, 37(9), 11270-11279. https://doi.org/10.1609/aaai.v37i9.26334

Issue

Section

AAAI Technical Track on Machine Learning IV