Compressed Self-Attention for Deep Metric Learning

Ziye Chen; Mingming Gong; Yanwu Xu; Chaohui Wang; Kun Zhang; Bo Du

doi:10.1609/aaai.v34i04.5762

Authors

Ziye Chen Wuhan University
Mingming Gong University of Melbourne
Yanwu Xu University of Pittsburgh
Chaohui Wang Université Paris-Est
Kun Zhang Carnegie Mellon University
Bo Du Wuhan University

DOI:

https://doi.org/10.1609/aaai.v34i04.5762

Abstract

In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual perception, by capturing richer contextual dependencies in visual data. To this end, we propose a novel module, named compressed self-attention (CSA), which significantly reduces the computation and memory cost with a neglectable decrease in accuracy with respect to the original SA mechanism, thanks to the following two characteristics: i) it only needs to compute a small number of base attention maps for a small number of base feature vectors; and ii) the output at each spatial location can be simply obtained by an adaptive weighted average of the outputs calculated from the base attention maps. The high computational efficiency of CSA enables the application to high-resolution shallow layers in convolutional neural networks with little additional cost. In addition, CSA makes it practical to further partition the feature maps into groups along the channel dimension and compute attention maps for features in each group separately, thus increasing the diversity of long-range dependencies and accordingly boosting the accuracy. We evaluate the performance of CSA via extensive experiments on two metric learning tasks: person re-identification and local descriptor learning. Qualitative and quantitative comparisons with latest methods demonstrate the significance of CSA in this topic.

Compressed Self-Attention for Deep Metric Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription