Patch-Aware Sample Selection for Efficient Masked Image Modeling

Authors

  • Zhengyang Zhuge Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
  • Jiaxing Wang JD.com
  • Yong Li JD.com
  • Yongjun Bao JD.com
  • Peisong Wang Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences AiRiA
  • Jian Cheng Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences AiRiA

DOI:

https://doi.org/10.1609/aaai.v38i15.29671

Keywords:

ML: Unsupervised & Self-Supervised Learning, CV: Learning & Optimization for CV, ML: Deep Learning Algorithms, ML: Deep Neural Architectures and Foundation Models, ML: Representation Learning

Abstract

Nowadays sample selection is drawing increasing attention. By extracting and training only on the most informative subset, sample selection can effectively reduce the training cost. Although sample selection is effective in conventional supervised learning, applying it to Masked Image Modeling (MIM) still poses challenges due to the gap between sample-level selection and patch-level pre-training. In this paper, we inspect the sample selection in MIM pre-training and find the basic selection suffers from performance degradation. We attribute this degradation primarily to 2 factors: the random mask strategy and the simple averaging function. We then propose Patch-Aware Sample Selection (PASS), including a low-cost Dynamic Trained Mask Predictor (DTMP) and Weighted Selection Score (WSS). DTMP consistently masks the informative patches in samples, ensuring a relatively accurate representation of selection score. WSS enhances the selection score using patch-level disparity. Extensive experiments show the effectiveness of PASS in selecting the most informative subset and accelerating pretraining. PASS exhibits superior performance across various datasets, MIM methods, and downstream tasks. Particularly, PASS improves MAE by 0.7% on ImageNet-1K while utilizing only 37% data budget and achieves ~1.7x speedup.

Published

2024-03-24

How to Cite

Zhuge, Z., Wang, J., Li, Y., Bao, Y., Wang, P. ., & Cheng, J. (2024). Patch-Aware Sample Selection for Efficient Masked Image Modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 17245-17253. https://doi.org/10.1609/aaai.v38i15.29671

Issue

Section

AAAI Technical Track on Machine Learning VI