Explainability-Driven Defense: Grad-CAM-Guided Model Refinement Against Adversarial Threats

Longwei Wang; Ifrat Ikhtear Uddin; Xiao Qin; Yang Zhou; KC Santosh

doi:10.1609/aaaiss.v6i1.36024

Authors

Longwei Wang University of South Dakota
Ifrat Ikhtear Uddin University of South Dakota
Xiao Qin Auburn University
Yang Zhou Auburn University
KC Santosh University of South Dakota

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36024

Abstract

Deep learning models have excelled in tasks like image recognition and autonomous systems but remain vulnerable to adversarial attacks and spurious correlations, limiting their reliability in real-world and safety-critical settings. To address these challenges, we propose a novel framework that leverages explainable Artificial Intelligence (XAI) to enhance the robustness of Convolutional Neural Networks. Our approach integrates Grad-CAM insights into the model refinement process, guiding feature masking to reduce reliance on irrelevant or misleading features. We introduce three masking strategies: (1) binary masking to retain high-activation regions, (2) Gaussian-blurred masking to preserve contextual information while reducing noise, and (3) difference-based masking to remove unstable features unique to the baseline model. We evaluate these strategies against two common adversarial attack methods—Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Results show that all three strategies improve FGSM accuracy, with binary and difference-based masking providing consistent gains across perturbation levels. Gaussian-blurred masking delivers the highest improvement in PGD accuracy, particularly at higher perturbation strengths.

Explainability-Driven Defense: Grad-CAM-Guided Model Refinement Against Adversarial Threats

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information