VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation
DOI:
https://doi.org/10.1609/aies.v8i1.36592Abstract
Automated detection of vulnerabilities in source code is an essential cybersecurity challenge, underpinning trust in digital systems and services. Graph Neural Networks (GNNs) have emerged as a promising approach as they can learn the structural and logical code relationships in a data-driven manner. However, the performance of GNNs is severely limited by training data imbalances and label noise. GNNs can often learn “spurious” correlations due to superficial code similarities in the training data, leading to detectors that do not generalize well to unseen real-world data. In this work, we propose a new unified framework for robust and interpretable vulnerability detection—that we call VISION—to mitigate spurious correlations by systematically augmenting a counterfactual training dataset. Counterfactuals are samples with minimal semantic modifications that have opposite prediction labels. Our complete framework includes: (i) generating effective counterfactuals by prompting a Large Language Model (LLM); (ii) targeted GNN model training on synthetically paired code examples with opposite labels; and (iii) graph-based interpretability to identify the truly crucial code statements relevant for vulnerability predictions while ignoring the spurious ones. We find that our framework reduces spurious learning and enables more robust and generalizable vulnerability detection, as demonstrated by improvements in overall accuracy (from 51.8% to 97.8%), pairwise contrast accuracy (from 4.5% to 95.8%), and worst-group accuracy increasing (from 0.7% to 85.5%) on the widely popular Common Weakness Enumeration (CWE)-20 vulnerability. We also demonstrate improvements using our proposed metrics, namely, intra-class attribution variance, inter-class attribution distance, and node score dependency. We provide a new benchmark for vulnerability detection, CWE-20-CFA, comprising 27,556 samples from functions affected by the high-impact and frequently occurring CWE-20 vulnerability, including both real and counterfactual examples. Furthermore, our approach enhances societal objectives of transparent and trustworthy AI-based cybersecurity systems through interactive visualization for human-in-the-loop analysis.Downloads
Published
2025-10-15
How to Cite
Egea, D., Halder, B., & Dutta, S. (2025). VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 8(1), 812–823. https://doi.org/10.1609/aies.v8i1.36592