Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance

Authors

  • Kwanyoung Kim Samsung Research

DOI:

https://doi.org/10.1609/aaai.v40i7.37488

Abstract

Diffusion models have demonstrated strong generative performance when using guidance methods such as classifier-free guidance (CFG), which enhance output quality by modifying the sampling trajectory. These methods typically improve a target output by intentionally degrading another, often the unconditional output, using heuristic perturbation functions such as identity mixing or blurred conditions. However, these approaches lack a principled foundation and rely on manually designed distortions. In this work, we propose Adversarial Sinkhorn Attention Guidance (ASAG), a novel method that reinterprets attention scores in diffusion models through the lens of optimal transport and intentionally increases the transport cost to disrupt unreliable attention flows. Instead of naively corrupting the attention mechanism, ASAG injects an adversarial cost within self-attention layers to reduce pixel-wise similarity between queries and keys. This deliberate degradation weakens misleading attention alignments and leads to improved conditional and unconditional sample quality. ASAG shows consistent improvements in text-to-image diffusion, and enhances controllability and fidelity in downstream applications such as IP-Adapter and ControlNet. The method is lightweight, plug-and-play, and improves reliability without requiring any model retraining.

Downloads

Published

2026-03-14

How to Cite

Kim, K. (2026). Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5682–5690. https://doi.org/10.1609/aaai.v40i7.37488

Issue

Section

AAAI Technical Track on Computer Vision IV