Detect or Track: Towards Cost-Effective Video Object Detection/Tracking


  • Hao Luo Huazhong University of Science and Technology
  • Wenxuan Xie Microsoft Research Asia
  • Xinggang Wang Huazhong University of Science and Technology
  • Wenjun Zeng Microsoft Research



State-of-the-art object detectors and trackers are developing fast. Trackers are in general more efficient than detectors but bear the risk of drifting. A question is hence raised – how to improve the accuracy of video object detection/tracking by utilizing the existing detectors and trackers within a given time budget? A baseline is frame skipping – detecting every N-th frames and tracking for the frames in between. This baseline, however, is suboptimal since the detection frequency should depend on the tracking quality. To this end, we propose a scheduler network, which determines to detect or track at a certain frame, as a generalization of Siamese trackers. Although being light-weight and simple in structure, the scheduler network is more effective than the frame skipping baselines and flow-based approaches, as validated on ImageNet VID dataset in video object detection/tracking.




How to Cite

Luo, H., Xie, W., Wang, X., & Zeng, W. (2019). Detect or Track: Towards Cost-Effective Video Object Detection/Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8803-8810.



AAAI Technical Track: Vision