All-in-One: Transferring Vision Foundation Models into Stereo Matching

Authors

  • Jingyi Zhou Fudan University
  • Haoyu Zhang Fudan University
  • Jiakang Yuan Fudan University
  • Peng Ye The Chinese University of Hong Kong Fudan University Shanghai AI Laboratory
  • Tao Chen Fudan University
  • Hao Jiang Xiaomi Inc., Beijing, China
  • Meiya Chen Xiaomi Inc., Beijing, China
  • Yangyang Zhang Xiaomi Inc., Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i10.33173

Abstract

As a fundamental vision task, stereo matching has made remarkable progress. While recent iterative optimization-based methods have achieved promising performance, their feature extraction capabilities still have room for improvement. Inspired by the ability of vision foundation models (VFMs) to extract general representations, in this work, we propose AIO-Stereo which can flexibly select and transfer knowledge from multiple heterogeneous VFMs to a single stereo matching model. To better reconcile features between heterogeneous VFMs and the stereo matching model and fully exploit prior knowledge from VFMs, we proposed a dual-level feature utilization mechanism that aligns heterogeneous features and transfers multi-level knowledge. Based on the mechanism, a dual-level selective knowledge transfer module is designed to selectively transfer knowledge and integrate the advantages of multiple VFMs. Experimental results show that AIO-Stereo achieves start-of-the-art performance on multiple datasets and ranks 1st on the Middlebury dataset and outperforms all the published work on the ETH3D benchmark.

Downloads

Published

2025-04-11

How to Cite

Zhou, J., Zhang, H., Yuan, J., Ye, P., Chen, T., Jiang, H., … Zhang, Y. (2025). All-in-One: Transferring Vision Foundation Models into Stereo Matching. Proceedings of the AAAI Conference on Artificial Intelligence, 39(10), 10797–10805. https://doi.org/10.1609/aaai.v39i10.33173

Issue

Section

AAAI Technical Track on Computer Vision IX