Less Is More Important: An Attention Module Guided by Probability Density Function for Convolutional Neural Networks
DOI:
https://doi.org/10.1609/aaai.v37i3.25397Keywords:
CV: Representation Learning for Vision, CV: Interpretability and Transparency, CV: Other Foundations of Computer VisionAbstract
Attention modules, which adaptively weight and refine features according to the importance of the input, have become a critical technique to boost the capability of convolutional neural networks. However, most existing attention modules are heuristic without a sound interpretation, and thus, require empirical engineering to design structure and operators within the modules. To handle the above issue, based on our 'less is more important' observation, we propose an Attention Module guided by Probability Density Function (PDF), dubbed PdfAM, which enjoys a rational motivation and requires few empirical structure designs. Concretely, we observe that pixels with less occurrence are prone to be textural details or foreground objects with much importance to aid vision tasks. Thus, with PDF values adopted as a smooth and anti-noise alternative to the pixel occurrence frequency, we design our PdfAM by first estimating the PDF based on some distribution assumption, and then predicting a 3D attention map via applying a negative correlation between the attention weights and the estimated PDF values. Furthermore, we develop learnable PDF-rescale parameters so as to adaptively transform the estimated PDF and predict a customized negative correlation. Experiments show that our PdfAM consistently boosts various networks under both high- and low-level vision tasks, and also performs favorably against other attention modules in terms of accuracy and convergence.Downloads
Published
2023-06-26
How to Cite
Xie, J., & Zhang, J. (2023). Less Is More Important: An Attention Module Guided by Probability Density Function for Convolutional Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2947-2955. https://doi.org/10.1609/aaai.v37i3.25397
Issue
Section
AAAI Technical Track on Computer Vision III