Explainability-Driven Defense: Grad-CAM-Guided Model Refinement Against Adversarial Threats
DOI:
https://doi.org/10.1609/aaaiss.v6i1.36024Abstract
Deep learning models have excelled in tasks like image recognition and autonomous systems but remain vulnerable to adversarial attacks and spurious correlations, limiting their reliability in real-world and safety-critical settings. To address these challenges, we propose a novel framework that leverages explainable Artificial Intelligence (XAI) to enhance the robustness of Convolutional Neural Networks. Our approach integrates Grad-CAM insights into the model refinement process, guiding feature masking to reduce reliance on irrelevant or misleading features. We introduce three masking strategies: (1) binary masking to retain high-activation regions, (2) Gaussian-blurred masking to preserve contextual information while reducing noise, and (3) difference-based masking to remove unstable features unique to the baseline model. We evaluate these strategies against two common adversarial attack methods—Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Results show that all three strategies improve FGSM accuracy, with binary and difference-based masking providing consistent gains across perturbation levels. Gaussian-blurred masking delivers the highest improvement in PGD accuracy, particularly at higher perturbation strengths.Downloads
Published
2025-08-01
How to Cite
Wang, L., Uddin, I. I., Qin, X., Zhou, Y., & Santosh, K. (2025). Explainability-Driven Defense: Grad-CAM-Guided Model Refinement Against Adversarial Threats. Proceedings of the AAAI Symposium Series, 6(1), 49-57. https://doi.org/10.1609/aaaiss.v6i1.36024
Issue
Section
AI-Driven Resilience: Building Robust, Adaptive Technologies for a Dynamic World