ResProto-FD: Visual-Language Residual Prototype Sets for Generalized Face Forgery Detection

Authors

  • Jiuyao Jing Xidian University
  • Yu Zheng Xidian University
  • Chunlei Peng Xidian University

DOI:

https://doi.org/10.1609/aaai.v40i7.37473

Abstract

With the rapid development of generative models, such as generative adversarial networks and diffusion models, the task of face forgery detection has emerged, aiming to identify forged faces in real-world scenarios. A key challenge for current face forgery detection models is improving generalization to unknown forgeries. To address this, we propose ResProto-FD, a framework that constructs residual prototype sets to capture diverse forgery cues and discriminative differences from real faces. Our novel perspective collects prototypes from the most informative residual features generated during training, enabling better representation of various forgery traces and real-vs-fake distinctions. First, we introduce a Visual-Language Residual Learning (VLRL) module based on the CLIP model. This module constructs residual features between image and text embeddings to capture inconsistencies between visual features and associated textual semantics. In doing so, it guides the model to attend to subtle visual forgery clues and enhances the discriminative power of image representations. Furthermore, we design a Gradient-aware Residual Prototypes (GRP) mechanism— a dynamic collection strategy that selectively stores uncertain residual features based on gradient signals to build the prototype sets. This enhances the model’s ability to generalize to unknown forgery types. Extensive experiments across various datasets and forgery methods demonstrate that ResProto-FD significantly improves generalization performance and consistently outperforms state-of-the-art methods.

Downloads

Published

2026-03-14

How to Cite

Jing, J., Zheng, Y., & Peng, C. (2026). ResProto-FD: Visual-Language Residual Prototype Sets for Generalized Face Forgery Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5548–5556. https://doi.org/10.1609/aaai.v40i7.37473

Issue

Section

AAAI Technical Track on Computer Vision IV