DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Ming Gui; Johannes Schusterbauer; Ulrich Prestel; Pingchuan Ma; Dmytro Kotovenko; Olga Grebenkova; Stefan Andreas Baumann; Vincent Tao Hu; Björn Ommer

doi:10.1609/aaai.v39i3.32330

Authors

Ming Gui CompVis @ LMU Munich, Munich Center for Machine Learning
Johannes Schusterbauer CompVis @ LMU Munich, Munich Center for Machine Learning
Ulrich Prestel CompVis @ LMU Munich, Munich Center for Machine Learning
Pingchuan Ma CompVis @ LMU Munich, Munich Center for Machine Learning
Dmytro Kotovenko CompVis @ LMU Munich, Munich Center for Machine Learning
Olga Grebenkova CompVis @ LMU Munich, Munich Center for Machine Learning
Stefan Andreas Baumann CompVis @ LMU Munich, Munich Center for Machine Learning
Vincent Tao Hu CompVis @ LMU Munich, Munich Center for Machine Learning
Björn Ommer CompVis @ LMU Munich, Munich Center for Machine Learning

DOI:

https://doi.org/10.1609/aaai.v39i3.32330

Abstract

Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport. Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions. We are the first to explore flow matching in this field, and we demonstrate that its interpolation trajectories enhance both training and sampling efficiency while preserving high performance. While generative models typically require extensive training data, we mitigate this dependency by integrating external knowledge from a pre-trained image diffusion model, enabling effective transfer even across differing objectives. To further boost our model performance, we employ synthetic data and utilize image-depth pairs generated by a discriminative model on an in-the-wild image dataset. As a generative model, our model can reliably estimate depth confidence, which provides an additional advantage. Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.

DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information