DiffusionTrack: Diffusion Model for Multi-Object Tracking

Authors

  • Run Luo Shenzen Institute of Advanced Technology, Chinese Academy of Sciences
  • Zikai Song Huazhong University of Science and Technology
  • Lintao Ma Huazhong University of Science and Technology
  • Jinlin Wei University of California Santa Barbara
  • Wei Yang Huazhong University of Science and Technology
  • Min Yang Shenzen Institute of Advanced Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i5.28192

Keywords:

CV: Motion & Tracking, CV: Vision for Robotics & Autonomous Driving

Abstract

Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames. Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) methods and one-stage joint detection and tracking (JDT) methods. Despite the success of these approaches, they also suffer from common problems, such as harmful global or local inconsistency, poor trade-off between robustness and model complexity, and lack of flexibility in different scenes within the same video. In this paper we propose a simple but robust framework that formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes. This novel progressive denoising diffusion strategy substantially augments the tracker's effectiveness, enabling it to discriminate between various objects. During the training stage, paired object boxes diffuse from paired ground-truth boxes to random distribution, and the model learns detection and tracking simultaneously by reversing this noising process. In inference, the model refines a set of paired randomly generated boxes to the detection and tracking results in a flexible one-step or multi-step denoising diffusion process. Extensive experiments on three widely used MOT benchmarks, including MOT17, MOT20, and DanceTrack, demonstrate that our approach achieves competitive performance compared to the current state-of-the-art methods. Code is available at https://github.com/RainBowLuoCS/DiffusionTrack.

Published

2024-03-24

How to Cite

Luo, R., Song, Z., Ma, L., Wei, J., Yang, W., & Yang, M. (2024). DiffusionTrack: Diffusion Model for Multi-Object Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 3991–3999. https://doi.org/10.1609/aaai.v38i5.28192

Issue

Section

AAAI Technical Track on Computer Vision IV