Targeted Activation Penalties Help CNNs Ignore Spurious Signals

Dekai Zhang; Matt Williams; Francesca Toni

doi:10.1609/aaai.v38i15.29610

Authors

Dekai Zhang Department of Computing, Imperial College London
Matt Williams Department of Radiotherapy, Charing Cross Hospital Institute of Global Health Innovation, Imperial College London
Francesca Toni Department of Computing, Imperial College London

DOI:

https://doi.org/10.1609/aaai.v38i15.29610

Keywords:

ML: Transparent, Interpretable, Explainable ML, CV: Bias, Fairness & Privacy, CV: Interpretability, Explainability, and Transparency, HAI: Human-in-the-loop Machine Learning, ML: Ethics, Bias, and Fairness, PEAI: Safety, Robustness & Trustworthiness

Abstract

Neural networks (NNs) can learn to rely on spurious signals in the training data, leading to poor generalisation. Recent methods tackle this problem by training NNs with additional ground-truth annotations of such signals. These methods may, however, let spurious signals re-emerge in deep convolutional NNs (CNNs). We propose Targeted Activation Penalty (TAP), a new method tackling the same problem by penalising activations to control the re-emergence of spurious signals in deep CNNs, while also lowering training times and memory usage. In addition, ground-truth annotations can be expensive to obtain. We show that TAP still works well with annotations generated by pre-trained models as effective substitutes of ground-truth annotations. We demonstrate the power of TAP against two state-of-the-art baselines on the MNIST benchmark and on two clinical image datasets, using four different CNN architectures.

Targeted Activation Penalties Help CNNs Ignore Spurious Signals

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription