LGD: Label-Guided Self-Distillation for Object Detection

Authors

  • Peizhen Zhang Megvii Technology
  • Zijian Kang Xi'an Jiaotong University
  • Tong Yang Megvii Technology
  • Xiangyu Zhang Megvii Technology
  • Nanning Zheng Xi'an Jiaotong University
  • Jian Sun Megvii Technology

DOI:

https://doi.org/10.1609/aaai.v36i3.20240

Keywords:

Computer Vision (CV), Knowledge Representation And Reasoning (KRR)

Abstract

In this paper, we propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation). Previous studies rely on a strong pretrained teacher to provide instructive knowledge that could be unavailable in real-world scenarios. Instead, we generate an instructive knowledge by inter-and-intra relation modeling among objects, requiring only student representations and regular labels. Concretely, our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge. They jointly form an implicit teacher at training phase, dynamically dependent on labels and evolving student representations. Modules in LGD are trained end-to-end with student detector and are discarded in inference. Experimentally, LGD obtains decent results on various detectors, datasets, and extensive tasks like instance segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). It boosts much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale training from 46.1% to 47.9% (+ 1.8%). Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also reduces 51% training cost beyond inherent student learning.

Downloads

Published

2022-06-28

How to Cite

Zhang, P., Kang, Z., Yang, T., Zhang, X., Zheng, N., & Sun, J. (2022). LGD: Label-Guided Self-Distillation for Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 3309-3317. https://doi.org/10.1609/aaai.v36i3.20240

Issue

Section

AAAI Technical Track on Computer Vision III