Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models
DOI:
https://doi.org/10.1609/aaai.v40i11.37888Abstract
3D Vision-Language Foundation Models (VLFMs) have demonstrated strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, their performance often degrades in practical scenarios where data are noisy, incomplete, or drawn from distributions that differ from the training data. To address this challenge, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. Uni-Adapter maintains a 3D cache that stores class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability under heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation through similarity scoring. In parallel, a graph-based label smoothing module models inter-prototype similarities to enforce label consistency among related prototypes. Finally, predictions from the original 3D VLFM and the refined 3D cache are unified through entropy-weighted aggregation to ensure reliable adaptation. Without retraining, Uni-Adapter effectively mitigates distribution shifts and achieves state-of-the-art performance across diverse 3D benchmarks and multiple 3D VLFMs, improving performance on ModelNet-40C by 10.55%, ScanObjectNN-C by 8.26%, and ShapeNet-C by 4.49% over the source 3D VLFMs.Downloads
Published
2026-03-14
How to Cite
Tamjidi, M., Dastmalchi, H., Alimoradijazi, M., Cheraghian, A., An, A., & Saberi, M. (2026). Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9296–9304. https://doi.org/10.1609/aaai.v40i11.37888
Issue
Section
AAAI Technical Track on Computer Vision VIII