DRFGD: Disentangled Representation-Focused Generative Defense for Attack-Tolerant Cross-Modal Hashing

Authors

  • Zhongqing Yu Department of Computer Science, Huaqiao University Fujian Key Lab. of Big Data Intell. and Security & Xiamen CVPR Key Laboratory
  • Xin Liu Department of Computer Science, Huaqiao University Department of Computer Science, Hong Kong Baptist University
  • Yiu-ming Cheung Department of Computer Science, Hong Kong Baptist University
  • Zhikai Hu Department of Computer Science, Hong Kong Baptist University
  • Wentao Fan Department of Artificial Intelligence, Beijing Normal-Hong Kong Baptist University
  • Pan Zhou School of Cyber Science and Engineering, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i19.38659

Abstract

With the widespread deployment of cross-modal retrieval in real-world scenarios, ensuring robustness against adversarial attacks is increasingly critical. Remarkably, deep cross-modal hashing is highly vulnerable to adversarial attacks due to its discrete nature and low-dimensional hash codes, while existing defense methods often fail to suppress perturbations embedded in vulnerable features and lack the capacity to model modality-specific structural differences, resulting in suboptimal adversarial robustness. To address these challenges, we propose a novel Disentangled Representation-Focused Generative Defense (DRFGD) framework for attack-tolerant cross-modal hashing. Without altering the structure of retrieval model, DRFGD defends against adversarial attacks by disentangling input representations into adversarial-robust and adversarial-vulnerable components, by an efficient dual-branch semantic-aware encoder. Guided by such disentangled robust features, an attack-tolerant generative module is seamlessly designed to synthesize semantically aligned and perturbation-resilient examples for robust adversarial training, thereby significantly promoting collaborative defense robustness to attackers. Consequently, the semantically consistent hash codes can be well obtained to enhance adversarial robustness in complex cross-modal attacking scenarios. Extensive experiments on public benchmarks demonstrate that DRFGD substantially improves retrieval robustness under various attacking scenarios, and shows its improved defense performance in comparison with the SOTA works.

Downloads

Published

2026-03-14

How to Cite

Yu, Z., Liu, X., Cheung, Y.- ming, Hu, Z., Fan, W., & Zhou, P. (2026). DRFGD: Disentangled Representation-Focused Generative Defense for Attack-Tolerant Cross-Modal Hashing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16226–16234. https://doi.org/10.1609/aaai.v40i19.38659

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III