DC-Former: Diverse and Compact Transformer for Person Re-identification
DOI:
https://doi.org/10.1609/aaai.v37i2.25226Keywords:
CV: Image and Video Retrieval, CV: Representation Learning for VisionAbstract
In person re-identification (ReID) task, it is still challenging to learn discriminative representation by deep learning, due to limited data. Generally speaking, the model will get better performance when increasing the amount of data. The addition of similar classes strengthens the ability of the classifier to identify similar identities, thereby improving the discrimination of representation. In this paper, we propose a Diverse and Compact Transformer (DC-Former) that can achieve a similar effect by splitting embedding space into multiple diverse and compact subspaces. Compact embedding subspace helps model learn more robust and discriminative embedding to identify similar classes. And the fusion of these diverse embeddings containing more fine-grained information can further improve the effect of ReID. Specifically, multiple class tokens are used in vision transformer to represent multiple embedding spaces. Then, a self-diverse constraint (SDC) is applied to these spaces to push them away from each other, which makes each embedding space diverse and compact. Further, a dynamic weight controller (DWC) is further designed for balancing the relative importance among them during training. The experimental results of our method are promising, which surpass previous state-of-the-art methods on several commonly used person ReID benchmarks. Our code is available at https://github.com/ant-research/Diverse-and-Compact-Transformer.Downloads
Published
2023-06-26
How to Cite
Li, W., Zou, C., Wang, M., Xu, F., Zhao, J., Zheng, R., Cheng, Y., & Chu, W. (2023). DC-Former: Diverse and Compact Transformer for Person Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 1415-1423. https://doi.org/10.1609/aaai.v37i2.25226
Issue
Section
AAAI Technical Track on Computer Vision II