DME: Unveiling the Bias for Better Generalized Monocular Depth Estimation

Authors

  • Songsong Yu Dalian University of Technology
  • Yifan Wang Dalian University of Technology
  • Yunzhi Zhuge Dalian University of Technology
  • Lijun Wang Dalian University of Technology
  • Huchuan Lu Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v38i7.28506

Keywords:

CV: Scene Analysis & Understanding, CV: Vision for Robotics & Autonomous Driving

Abstract

This paper aims to design monocular depth estimation models with better generalization abilities. To this end, we have conducted quantitative analysis and discovered two important insights. First, the Simulation Correlation phenomenon, commonly seen in long-tailed classification problems, also exists in monocular depth estimation, indicating that the imbalanced depth distribution in training data may be the cause of limited generalization ability. Second, the imbalanced and long-tail distribution of depth values extends beyond the dataset scale, and also manifests within each individual image, further exacerbating the challenge of monocular depth estimation. Motivated by the above findings, we propose the Distance-aware Multi-Expert (DME) depth estimation model. Unlike prior methods that handle different depth range indiscriminately, DME adopts a divide-and-conquer philosophy where each expert is responsible for depth estimation of regions within a specific depth range. As such, the depth distribution seen by each expert is more uniform and can be more easily predicted. A pixel-level routing module is further designed and learned to stitch the prediction of all experts into the final depth map. Experiments show that DME achieves state-of-the-art performance on both NYU-Depth v2 and KITTI, and also delivers favorable zero-shot generalization capability on unseen datasets.

Published

2024-03-24

How to Cite

Yu, S., Wang, Y., Zhuge, Y., Wang, L., & Lu, H. (2024). DME: Unveiling the Bias for Better Generalized Monocular Depth Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6817-6825. https://doi.org/10.1609/aaai.v38i7.28506

Issue

Section

AAAI Technical Track on Computer Vision VI