Stereo Neural Vernier Caliper


  • Shichao Li Hong Kong University of Science and Technology
  • Zechun Liu Hong Kong University of Science and Technology Carnegie Mellon University
  • Zhiqiang Shen Carnegie Mellon University Mohamed bin Zayed University of Artificial Intelligence Hong Kong University of Science and Technology
  • Kwang-Ting Cheng Hong Kong University of Science and Technology



Computer Vision (CV), Intelligent Robotics (ROB), Domain(s) Of Application (APP)


We propose a new object-centric framework for learning-based stereo 3D object detection. Previous studies build scene-centric representations that do not consider the significant variation among outdoor instances and thus lack the flexibility and functionalities that an instance-level model can offer. We build such an instance-level model by formulating and tackling a local update problem, i.e., how to predict a refined update given an initial 3D cuboid guess. We demonstrate how solving this problem can complement scene-centric approaches in (i) building a coarse-to-fine multi-resolution system, (ii) performing model-agnostic object location refinement, and (iii) conducting stereo 3D tracking-by-detection. Extensive experiments demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on the KITTI benchmark. Code and pre-trained models are available at




How to Cite

Li, S., Liu, Z., Shen, Z., & Cheng, K.-T. (2022). Stereo Neural Vernier Caliper. Proceedings of the AAAI Conference on Artificial Intelligence, 36(2), 1376-1385.



AAAI Technical Track on Computer Vision II