Targeted Activation Penalties Help CNNs Ignore Spurious Signals

Authors

  • Dekai Zhang Department of Computing, Imperial College London
  • Matt Williams Department of Radiotherapy, Charing Cross Hospital Institute of Global Health Innovation, Imperial College London
  • Francesca Toni Department of Computing, Imperial College London

DOI:

https://doi.org/10.1609/aaai.v38i15.29610

Keywords:

ML: Transparent, Interpretable, Explainable ML, CV: Bias, Fairness & Privacy, CV: Interpretability, Explainability, and Transparency, HAI: Human-in-the-loop Machine Learning, ML: Ethics, Bias, and Fairness, PEAI: Safety, Robustness & Trustworthiness

Abstract

Neural networks (NNs) can learn to rely on spurious signals in the training data, leading to poor generalisation. Recent methods tackle this problem by training NNs with additional ground-truth annotations of such signals. These methods may, however, let spurious signals re-emerge in deep convolutional NNs (CNNs). We propose Targeted Activation Penalty (TAP), a new method tackling the same problem by penalising activations to control the re-emergence of spurious signals in deep CNNs, while also lowering training times and memory usage. In addition, ground-truth annotations can be expensive to obtain. We show that TAP still works well with annotations generated by pre-trained models as effective substitutes of ground-truth annotations. We demonstrate the power of TAP against two state-of-the-art baselines on the MNIST benchmark and on two clinical image datasets, using four different CNN architectures.

Published

2024-03-24

How to Cite

Zhang, D., Williams, M., & Toni, F. (2024). Targeted Activation Penalties Help CNNs Ignore Spurious Signals. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16705-16713. https://doi.org/10.1609/aaai.v38i15.29610

Issue

Section

AAAI Technical Track on Machine Learning VI