Adversarial Training Reduces Information and Improves Transferability

Authors

  • Matteo Terzi University of Padova
  • Alessandro Achille AWS
  • Marco Maggipinto University of Padova
  • Gian Antonio Susto University of Padova

Keywords:

Adversarial Attacks & Robustness

Abstract

Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility. The latter property may seem counter-intuitive as it is widely accepted by the community that classification models should only capture the minimal information (features) required for the task. Motivated by this discrepancy, we investigate the dual relationship between Adversarial Training and Information Theory. We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task. We validate our results employing robust networks trained on CIFAR-10, CIFAR-100 and ImageNet on several datasets. Moreover, we show that Adversarial Training reduces Fisher information of representations about the input and of the weights about the task, and we provide a theoretical argument which explains the invertibility of deterministic networks without violating the principle of minimality. Finally, we leverage our theoretical insights to remarkably improve the quality of reconstructed images through inversion.

Downloads

Published

2021-05-18

How to Cite

Terzi, M., Achille, A., Maggipinto, M., & Susto, G. A. (2021). Adversarial Training Reduces Information and Improves Transferability. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2674-2682. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16371

Issue

Section

AAAI Technical Track on Computer Vision II