Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

Authors

  • Ruichu Cai School of Computer Science, Guangdong University of Technology, Guangzhou, China Peng Cheng Laboratory, Shenzhen, China
  • Yuxuan Zhu School of Computer Science, Guangdong University of Technology, Guangzhou, China
  • Jie Qiao School of Computer Science, Guangdong University of Technology, Guangzhou, China
  • Zefeng Liang School of Computer Science, Guangdong University of Technology, Guangzhou, China
  • Furui Liu Zhejiang Lab, Hangzhou, China
  • Zhifeng Hao College of Science, Shantou University, Shantou, China

DOI:

https://doi.org/10.1609/aaai.v38i10.28990

Keywords:

ML: Adversarial Learning & Robustness, ML: Causal Learning

Abstract

Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted adversarial examples, which are generated through either well-conceived L_p-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer where to attack. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate Counterfactual ADversarial Examples to answer how to attack. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

Published

2024-03-24

How to Cite

Cai, R., Zhu, Y., Qiao, J., Liang, Z., Liu, F., & Hao, Z. (2024). Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11132-11140. https://doi.org/10.1609/aaai.v38i10.28990

Issue

Section

AAAI Technical Track on Machine Learning I