Sampling-Resilient Multi-Object Tracking


  • Zepeng Li Zhejiang University
  • Dongxiang Zhang Zhejiang University
  • Sai Wu Zhejiang University
  • Mingli Song Zhejiang University
  • Gang Chen Zhejiang University



CV: Motion & Tracking, ML: Scalability of ML Systems


Multi-Object Tracking (MOT) is a cornerstone operator for video surveillance applications. To enable real-time processing of large-scale live video streams, we study an interesting scenario called down-sampled MOT, which performs object tracking only on a small subset of video frames. The problem is challenging for state-of-the-art MOT methods, which exhibit significant performance degradation under high frame reduction ratios. In this paper, we devise a sampling-resilient tracker with a novel sparse-observation Kalman filter (SOKF). It integrates an LSTM network to capture non-linear and dynamic motion patterns caused by sparse observations. Since the LSTM-based state transition is not compatible with the original noise estimation mechanism, we propose new estimation strategies based on Bayesian neural networks and derive the optimal Kalman gain for SOKF. To associate the detected bounding boxes robustly, we also propose a comprehensive similarity metric that systematically integrates multiple spatial matching signals. Experiments on three benchmark datasets show that our proposed tracker achieves the best trade-off between efficiency and accuracy. With the same tracking accuracy, we reduce the total processing time of ByteTrack by 2× in MOT17 and 3× in DanceTrack.



How to Cite

Li, Z., Zhang, D. ., Wu, S., Song, M., & Chen, G. (2024). Sampling-Resilient Multi-Object Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3297-3305.



AAAI Technical Track on Computer Vision III