Matching on Sets: Conquer Occluded Person Re-identification Without Alignment
Keywords:Image and Video Retrieval
AbstractOccluded person re-identification (re-ID) is a challenging task as different human parts may become invisible in cluttered scenes, making it hard to match person images of different identities. Most existing methods address this challenge by aligning spatial features of body parts according to semantic information (e.g. human poses) or feature similarities but this approach is complicated and sensitive to noises. This paper presents Matching on Sets (MoS), a novel method that positions occluded person re-ID as a set matching task without requiring spatial alignment. MoS encodes a person image by a pattern set as represented by a `global vector’ with each element capturing one specific visual pattern, and it introduces Jaccard distance as a metric to compute the distance between pattern sets and measure image similarity. To enable Jaccard distance over continuous real numbers, we employ minimization and maximization to approximate the operations of intersection and union, respectively. In addition, we design a Jaccard triplet loss that enhances the pattern discrimination and allows to embed set matching into deep neural networks for end-to-end training. In the inference stage, we introduce a conflict penalty mechanism that detects mutually exclusive patterns in the pattern union of image pairs and decreases their similarities accordingly. Extensive experiments over three widely used datasets (Market1501, DukeMTMC and Occluded-DukeMTMC) show that MoS achieves superior re-ID performance. Additionally, it is tolerant of occlusions and outperforms the state-of-the-art by large margins for Occluded-DukeMTMC.
How to Cite
Jia, M., Cheng, X., Zhai, Y., Lu, S., Ma, S., Tian, Y., & Zhang, J. (2021). Matching on Sets: Conquer Occluded Person Re-identification Without Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1673-1681. https://doi.org/10.1609/aaai.v35i2.16260
AAAI Technical Track on Computer Vision I