Diff-NAT: Better Naturalistic and Aggressive Adversarial Attacks via Class-Optimized Diffusion for Object Detection

Authors

  • Qinglong Yan Wuhan University
  • Tong Zou Wuhan University
  • Xunpeng Yi Wuhan University
  • Xinyu Xiang Wuhan University
  • Xuying Wu Wuhan University
  • Hao Zhang Wuhan University
  • Jiayi Ma Wuhan University

DOI:

https://doi.org/10.1609/aaai.v40i14.38137

Abstract

Recent advances in naturalistic physical adversarial patch generation show great promise in protecting personal privacy against detector-based malicious surveillance while remaining inconspicuous to human observers. In this work, we present the first systematic categorization and in-depth re-examination of existing methods into three representative paradigms, revealing a pervasive imbalance: enforcing naturalness constraints inherently restricts the adversarial search space, thus limiting attack performance. To address this challenge, we propose a novel paradigm based on class-optimized diffusion, termed Diff-NAT. Diff-NAT leverages pretrained diffusion models as powerful natural image priors and introduces a unified iterative framework that jointly optimizes two complementary components: semantic-level textual prompts and instance-level latent codes. Specifically, prompt optimization enables broad traversal across inter-class semantic regions, while latent refinement allows for fine-grained manipulation within class objectives. This dual-level optimization facilitates progressive navigation toward adversarial distributions embedded within the natural semantic manifold. Extensive experiments in both digital and physical settings demonstrate that Diff-NAT outperforms existing SOTA approaches in terms of both visual realism and aggressiveness.

Published

2026-03-14

How to Cite

Yan, Q., Zou, T., Yi, X., Xiang, X., Wu, X., Zhang, H., & Ma, J. (2026). Diff-NAT: Better Naturalistic and Aggressive Adversarial Attacks via Class-Optimized Diffusion for Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11541-11549. https://doi.org/10.1609/aaai.v40i14.38137

Issue

Section

AAAI Technical Track on Computer Vision XI