DualNet: Robust Self-Supervised Stereo Matching with Pseudo-Label Supervision

Authors

  • Yun Wang City University of Hong Kong
  • Jiahao Zheng City University of Hong Kong
  • Chenghao Zhang Chinese Academy of Sciences Institute of Automation,CASIA
  • Zhanjie Zhang Zhejiang University
  • Kunhong Li Sun Yat-Sen University
  • Yongjian Zhang Sun Yat-Sen University
  • Junjie Hu Shenzhen Institute of Artificial Intelligence and Robotics for Society, The Chinese University of Hong Kong (Shenzhen)

DOI:

https://doi.org/10.1609/aaai.v39i8.32882

Abstract

Self-supervised stereo matching has drawn attention due to its ability to estimate disparity without needing ground-truth data. However, existing self-supervised stereo matching methods heavily rely on the photo-metric consistency assumption, which is vulnerable to natural disturbances, resulting in ambiguous supervision and inferior performance compared to the supervised ones. To relax the limitation of the photo-metric consistency assumption and even bypass this assumption, we propose a novel self-supervised framework named DualNet, which consists of two key steps: robust self-supervised teacher learning and pseudo-label supervised student training. Specifically, the teacher model is first trained in a self-supervised manner with a focus on feature-metric consistency and data augmentation consistency. Then, the output of the teacher model is geometrically constrained to obtain high-quality pseudo labels. Benefiting from these high-quality pseudo labels, the student model can outperform its teacher model by a large margin. With the two well-designed steps, the proposed framework DualNet ranks 1st among all self-supervised methods on multiple benchmarks, surprisingly even outperforming several supervised counterparts.

Published

2025-04-11

How to Cite

Wang, Y., Zheng, J., Zhang, C., Zhang, Z., Li, K., Zhang, Y., & Hu, J. (2025). DualNet: Robust Self-Supervised Stereo Matching with Pseudo-Label Supervision. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8178-8186. https://doi.org/10.1609/aaai.v39i8.32882

Issue

Section

AAAI Technical Track on Computer Vision VII