Targeted Activation Penalties Help CNNs Ignore Spurious Signals


  • Dekai Zhang Department of Computing, Imperial College London
  • Matt Williams Department of Radiotherapy, Charing Cross Hospital Institute of Global Health Innovation, Imperial College London
  • Francesca Toni Department of Computing, Imperial College London



ML: Transparent, Interpretable, Explainable ML, CV: Bias, Fairness & Privacy, CV: Interpretability, Explainability, and Transparency, HAI: Human-in-the-loop Machine Learning, ML: Ethics, Bias, and Fairness, PEAI: Safety, Robustness & Trustworthiness


Neural networks (NNs) can learn to rely on spurious signals in the training data, leading to poor generalisation. Recent methods tackle this problem by training NNs with additional ground-truth annotations of such signals. These methods may, however, let spurious signals re-emerge in deep convolutional NNs (CNNs). We propose Targeted Activation Penalty (TAP), a new method tackling the same problem by penalising activations to control the re-emergence of spurious signals in deep CNNs, while also lowering training times and memory usage. In addition, ground-truth annotations can be expensive to obtain. We show that TAP still works well with annotations generated by pre-trained models as effective substitutes of ground-truth annotations. We demonstrate the power of TAP against two state-of-the-art baselines on the MNIST benchmark and on two clinical image datasets, using four different CNN architectures.



How to Cite

Zhang, D., Williams, M., & Toni, F. (2024). Targeted Activation Penalties Help CNNs Ignore Spurious Signals. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16705-16713.



AAAI Technical Track on Machine Learning VI