Clarifying the Behavior and the Difficulty of Adversarial Training

Authors

  • Xu Cheng Nanjing University of Science and Technology
  • Hao Zhang Shanghai Jiao Tong University
  • Yue Xin Shanghai Jiao Tong University
  • Wen Shen Shanghai Jiao Tong University
  • Quanshi Zhang Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v38i10.29032

Keywords:

ML: Transparent, Interpretable, Explainable ML

Abstract

Adversarial training is usually difficult to optimize. This paper provides conceptual and analytic insights into the difficulty of adversarial training via a simple theoretical study, where we derive an approximate dynamics of a recursive multi-step attack in a simple setting. Despite the simplicity of our theory, it still reveals verifiable predictions about various phenomena in adversarial training under real-world settings. First, compared to vanilla training, adversarial training is more likely to boost the influence of input samples with large gradient norms in an exponential manner. Besides, adversarial training also strengthens the influence of the Hessian matrix of the loss w.r.t. network parameters, which is more likely to make network parameters oscillate and boosts the difficulty of adversarial training.

Published

2024-03-24

How to Cite

Cheng, X., Zhang, H., Xin, Y., Shen, W., & Zhang, Q. (2024). Clarifying the Behavior and the Difficulty of Adversarial Training. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 11507-11515. https://doi.org/10.1609/aaai.v38i10.29032

Issue

Section

AAAI Technical Track on Machine Learning I