Fully-Dynamic Decision Trees

Authors

  • Marco Bressan University of Milan
  • Gabriel Damay Institut Polytechnique de Paris, Télécom Paris
  • Mauro Sozio Institut Polytechnique de Paris, Télécom Paris

DOI:

https://doi.org/10.1609/aaai.v37i6.25838

Keywords:

ML: Classification and Regression, DMKM: Data Stream Mining, ML: Scalability of ML Systems

Abstract

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given ε>0 our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive ε of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of O((d·log³n)/ε²), which improves to O((d·log²n)/ε) for binary or categorical features, while it uses space O(n·d), where n is the maximum number of examples at any point in time and d is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees requires amortized running time Ω(d) and space Ω(n·d/polylog(nd)). We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

Downloads

Published

2023-06-26

How to Cite

Bressan, M., Damay, G., & Sozio, M. (2023). Fully-Dynamic Decision Trees. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 6842-6849. https://doi.org/10.1609/aaai.v37i6.25838

Issue

Section

AAAI Technical Track on Machine Learning I