TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability

Authors

  • Fengji Ma The Hong Kong University of Science and Technology (Guangzhou)
  • Hei Victor Cheng Aarhus University
  • Chenxing Li Tencent
  • Li Liu The Hong Kong University of Science and Technology (Guangzhou)

DOI:

https://doi.org/10.1609/aaai.v40i29.39603

Abstract

Achieving zero-shot adversarial robustness without sacrificing generalization remains challenging for foundation models such as CLIP, especially under large adversarial perturbations. Through empirical analyses, we identify three critical yet overlooked issues: (1) Logit margins exhibit a stable offset between small and large adversarial perturbations, suggesting that explicitly adjusting margins could improve robustness against unseen large perturbations. (2) A significant negative correlation exists between logit margin and inter-class semantic similarity, indicating that semantic structures are insufficiently leveraged by existing methods. (3) Existing methods for adjusting text embeddings disrupt the intrinsic semantic consistency established by pre-trained models, undermining generalization capability. Motivated by these findings, we propose a novel Text-Image Mutual Awareness (TIMA) framework, including a Text-Aware Image (TAI) tuning module with an Adaptive Semantic-Aware Margin (ASAM) to explicitly calibrate logit margins, and an Image-Aware Text (IAT) tuning module with Semantic Consistent Minimum Hyperspherical Energy (SC-MHE) to preserve semantic consistency. Comprehensive experiments validate that TIMA significantly outperforms existing approaches by effectively addressing the identified limitations.

Downloads

Published

2026-03-14

How to Cite

Ma, F., Cheng, H. V., Li, C., & Liu, L. (2026). TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability. Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24235–24243. https://doi.org/10.1609/aaai.v40i29.39603

Issue

Section

AAAI Technical Track on Machine Learning VI