Collaborative Transformers with Multi-Level Forensic Attention for Image Manipulation Localization
DOI:
https://doi.org/10.1609/aaai.v40i15.38250Abstract
The proliferation of the tampered images on social media can pose serious societal risks, influencing public opinion and causing panic. Image Manipulation Localization technique has advanced to address this, but some methods focus on microscopic traces, overlooking macroscopic semantics that deceive viewers. To address this problem, we propose a novel Image Manipulation Localization framework called Collaborative Transformers (Co-Transformers), designed to fully explore and utilize the collaborative information between macroscopic semantics and microscopic traces. This framework is based on two Vision Transformer variants. The first variant captures the semantic logic of the image. The second variant delves into microscopic tampering traces. By dynamically fusing these two complementary features, the framework enables interaction between macroscopic semantic inconsistencies and microscopic abnormal traces, effectively coordinating their relationship in the latent space. Furthermore, we introduce a new Multi-Level Forensic Attention (MLF-Attention) mechanism to enhance the model's ability to extract various tampered traces, this mechanism can be integrated into our framework. Compared with existing methods, our proposed framework achieves state-of-the-art results in localization accuracy and shows good robustness against various attacks.Published
2026-03-14
How to Cite
Zhang, J., Feng, W., Wang, S., Kou, F., Yu, H., & Niu, S. (2026). Collaborative Transformers with Multi-Level Forensic Attention for Image Manipulation Localization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12556–12563. https://doi.org/10.1609/aaai.v40i15.38250
Issue
Section
AAAI Technical Track on Computer Vision XII