Collaborative Transformers with Multi-Level Forensic Attention for Image Manipulation Localization

Jiwei Zhang; Wenbo Feng; Siwei Wang; Feifei Kou; Haoyang Yu; Shaozhang Niu

doi:10.1609/aaai.v40i15.38250

Authors

Jiwei Zhang School of Computer Science (National Pilot School of Software Engineering), BUPT, Beijing, China Key Laboratory of Interactive Technology and Experience System, Ministry of Culture and Tourism(BUPT), Beijing, China
Wenbo Feng School of Computer Science (National Pilot School of Software Engineering), BUPT, Beijing, China
Siwei Wang The Intelligent Game and Decision Lab, Academy of Military Sciences, Beijing, China
Feifei Kou School of Computer Science (National Pilot School of Software Engineering), BUPT, Beijing, China
Haoyang Yu China Mobile Internet Co., Ltd, GuangZhou, China
Shaozhang Niu School of Computer Science (National Pilot School of Software Engineering), BUPT, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38250

Abstract

The proliferation of the tampered images on social media can pose serious societal risks, influencing public opinion and causing panic. Image Manipulation Localization technique has advanced to address this, but some methods focus on microscopic traces, overlooking macroscopic semantics that deceive viewers. To address this problem, we propose a novel Image Manipulation Localization framework called Collaborative Transformers (Co-Transformers), designed to fully explore and utilize the collaborative information between macroscopic semantics and microscopic traces. This framework is based on two Vision Transformer variants. The first variant captures the semantic logic of the image. The second variant delves into microscopic tampering traces. By dynamically fusing these two complementary features, the framework enables interaction between macroscopic semantic inconsistencies and microscopic abnormal traces, effectively coordinating their relationship in the latent space. Furthermore, we introduce a new Multi-Level Forensic Attention (MLF-Attention) mechanism to enhance the model's ability to extract various tampered traces, this mechanism can be integrated into our framework. Compared with existing methods, our proposed framework achieves state-of-the-art results in localization accuracy and shows good robustness against various attacks.

Collaborative Transformers with Multi-Level Forensic Attention for Image Manipulation Localization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information