Bagging by Design (on the Suboptimality of Bagging)


  • Periklis Papakonstantinou Tsinghua University
  • Jia Xu Tsinghua University
  • Zhu Cao Tsinghua University



bagging, bootstrapping, aggregation, combinatorial design


Bagging (Breiman 1996) and its variants is one of the most popular methods in aggregating classifiers and regressors. Originally, its analysis assumed that the bootstraps are built from an unlimited, independent source of samples, therefore we call this form of bagging ideal-bagging. However in the real world, base predictors are trained on data subsampled from a limited number of training samples and thus they behave very differently. We analyze the effect of intersections between bootstraps, obtained by subsampling, to train different base predictors. Most importantly, we provide an alternative subsampling method called design-bagging based on a new construction of combinatorial designs, and prove it universally better than bagging. Methodologically, we succeed at this level of generality because we compare the prediction accuracy of bagging and design-bagging relative to the accuracy ideal-bagging. This finds potential applications in more involved bagging-based methods. Our analytical results are backed up by experiments on classification and regression settings.




How to Cite

Papakonstantinou, P., Xu, J., & Cao, Z. (2014). Bagging by Design (on the Suboptimality of Bagging). Proceedings of the AAAI Conference on Artificial Intelligence, 28(1).



Main Track: Novel Machine Learning Algorithms