Explainability-Driven Defense: Grad-CAM-Guided Model Refinement Against Adversarial Threats

Authors

  • Longwei Wang University of South Dakota
  • Ifrat Ikhtear Uddin University of South Dakota
  • Xiao Qin Auburn University
  • Yang Zhou Auburn University
  • KC Santosh University of South Dakota

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36024

Abstract

Deep learning models have excelled in tasks like image recognition and autonomous systems but remain vulnerable to adversarial attacks and spurious correlations, limiting their reliability in real-world and safety-critical settings. To address these challenges, we propose a novel framework that leverages explainable Artificial Intelligence (XAI) to enhance the robustness of Convolutional Neural Networks. Our approach integrates Grad-CAM insights into the model refinement process, guiding feature masking to reduce reliance on irrelevant or misleading features. We introduce three masking strategies: (1) binary masking to retain high-activation regions, (2) Gaussian-blurred masking to preserve contextual information while reducing noise, and (3) difference-based masking to remove unstable features unique to the baseline model. We evaluate these strategies against two common adversarial attack methods—Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Results show that all three strategies improve FGSM accuracy, with binary and difference-based masking providing consistent gains across perturbation levels. Gaussian-blurred masking delivers the highest improvement in PGD accuracy, particularly at higher perturbation strengths.

Downloads

Published

2025-08-01

How to Cite

Wang, L., Uddin, I. I., Qin, X., Zhou, Y., & Santosh, K. (2025). Explainability-Driven Defense: Grad-CAM-Guided Model Refinement Against Adversarial Threats. Proceedings of the AAAI Symposium Series, 6(1), 49-57. https://doi.org/10.1609/aaaiss.v6i1.36024

Issue

Section

AI-Driven Resilience: Building Robust, Adaptive Technologies for a Dynamic World