Diverse Knowledge Distillation for End-to-End Person Search

Authors

  • Xinyu Zhang Tongji University, China The University of Adelaide, Australia
  • Xinlong Wang The University of Adelaide, Australia
  • Jia-Wang Bian The University of Adelaide, Australia
  • Chunhua Shen The University of Adelaide, Australia Monash University, Australia
  • Mingyu You Tongji University, China

DOI:

https://doi.org/10.1609/aaai.v35i4.16454

Keywords:

Image and Video Retrieval

Abstract

Person search aims to localize and identify a specific person from a gallery of images. Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches. The former views person search as two independent tasks and achieves dominant results using separately trained person detection and re-identification (Re-ID) models. The latter performs person search in an end-to-end fashion. Although the end-to-end approaches yield higher inference efficiency, they largely lag behind those two-step counterparts in terms of accuracy. In this paper, we argue that the gap between the two kinds of methods is mainly caused by the Re-ID sub-networks of end-to-end methods. To this end, we propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck. We also design a spatial-invariant augmentation to assist model to be invariant to inaccurate detection results. Experimental results on the CUHK-SYSU and PRW datasets demonstrate the superiority of our method against existing approaches -- it achieves on par accuracy with state-of-the-art two-step methods while maintaining high efficiency due to the single joint model. Code is available at: https://git.io/DKD-PersonSearch.

Downloads

Published

2021-05-18

How to Cite

Zhang, X., Wang, X., Bian, J.-W., Shen, C., & You, M. (2021). Diverse Knowledge Distillation for End-to-End Person Search. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3412-3420. https://doi.org/10.1609/aaai.v35i4.16454

Issue

Section

AAAI Technical Track on Computer Vision III