TRUST: Leveraging Text Robustness for Unsupervised Domain Adaptation
DOI:
https://doi.org/10.1609/aaai.v40i28.39535Abstract
Recent unsupervised domain adaptation (UDA) methods have shown great success in addressing classical domain shifts (e.g., synthetic-to-real), but they still suffer under complex shifts (e.g. geographical shift), where both the background and object appearances differ significantly across domains. Prior works showed that the language modality can help in the adaptation process, exhibiting more robustness to such complex shifts. In this paper, we introduce TRUST, a novel UDA approach that exploits the robustness of the language modality to guide the adaptation of a vision model. TRUST generates pseudo-labels for target samples from their captions and introduces a novel uncertainty estimation strategy that uses normalised CLIP similarity scores to estimate the uncertainty of the generated pseudo-labels. Such estimated uncertainty is then used to reweight the classification loss, mitigating the adverse effects of wrong pseudo-labels obtained from low-quality captions. To further increase the robustness of the vision model, we propose a multimodal soft-contrastive learning loss that aligns the vision and language feature spaces, by leveraging captions to guide the contrastive training of the vision model on target images. In our contrastive loss, each pair of images acts as both a positive and a negative pair and their feature representations are attracted and repulsed with a strength proportional to the similarity of their captions. This solution avoids the need for hardly determining positive and negative pairs, which is critical in the UDA setting. Our approach outperforms previous methods, setting the new state-of-the-art on classical (DomainNet) and complex (GeoNet) domain shifts. The code is available at https://github.com/MattiaLitrico/TRUST-Leveraging-Text-Robustness-for-Unsupervised-Domain-Adaptation.Downloads
Published
2026-03-14
How to Cite
Litrico, M., Giuffrida, M. V., Battiato, S., & Tuia, D. (2026). TRUST: Leveraging Text Robustness for Unsupervised Domain Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23621–23630. https://doi.org/10.1609/aaai.v40i28.39535
Issue
Section
AAAI Technical Track on Machine Learning V