Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

Mehran Tamjidi; Hamidreza Dastmalchi; Mohammadreza Alimoradijazi; Ali Cheraghian; Aijun An; Morteza Saberi

doi:10.1609/aaai.v40i11.37888

Authors

Mehran Tamjidi University of Technology Sydney
Hamidreza Dastmalchi York University
Mohammadreza Alimoradijazi University of New South Wales
Ali Cheraghian Macquarie University
Aijun An York University
Morteza Saberi University of Technology Sydney

DOI:

https://doi.org/10.1609/aaai.v40i11.37888

Abstract

3D Vision-Language Foundation Models (VLFMs) have demonstrated strong generalization and zero-shot recognition capabilities in open-world point cloud processing tasks. However, their performance often degrades in practical scenarios where data are noisy, incomplete, or drawn from distributions that differ from the training data. To address this challenge, we propose Uni-Adapter, a novel training-free online test-time adaptation (TTA) strategy for 3D VLFMs based on dynamic prototype learning. Uni-Adapter maintains a 3D cache that stores class-specific cluster centers as prototypes, which are continuously updated to capture intra-class variability under heterogeneous data distributions. These dynamic prototypes serve as anchors for cache-based logit computation through similarity scoring. In parallel, a graph-based label smoothing module models inter-prototype similarities to enforce label consistency among related prototypes. Finally, predictions from the original 3D VLFM and the refined 3D cache are unified through entropy-weighted aggregation to ensure reliable adaptation. Without retraining, Uni-Adapter effectively mitigates distribution shifts and achieves state-of-the-art performance across diverse 3D benchmarks and multiple 3D VLFMs, improving performance on ModelNet-40C by 10.55%, ScanObjectNN-C by 8.26%, and ShapeNet-C by 4.49% over the source 3D VLFMs.

Adapt-As-You-Walk Through the Clouds: Training-Free Online Test-Time Adaptation of 3D Vision-Language Foundation Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information