DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching
DOI:
https://doi.org/10.1609/aaai.v39i3.32330Abstract
Current discriminative depth estimation methods often produce blurry artifacts, while generative approaches suffer from slow sampling due to curvatures in the noise-to-depth transport. Our method addresses these challenges by framing depth estimation as a direct transport between image and depth distributions. We are the first to explore flow matching in this field, and we demonstrate that its interpolation trajectories enhance both training and sampling efficiency while preserving high performance. While generative models typically require extensive training data, we mitigate this dependency by integrating external knowledge from a pre-trained image diffusion model, enabling effective transfer even across differing objectives. To further boost our model performance, we employ synthetic data and utilize image-depth pairs generated by a discriminative model on an in-the-wild image dataset. As a generative model, our model can reliably estimate depth confidence, which provides an additional advantage. Our approach achieves competitive zero-shot performance on standard benchmarks of complex natural scenes while improving sampling efficiency and only requiring minimal synthetic data for training.Downloads
Published
2025-04-11
How to Cite
Gui, M., Schusterbauer, J., Prestel, U., Ma, P., Kotovenko, D., Grebenkova, O., … Ommer, B. (2025). DepthFM: Fast Generative Monocular Depth Estimation with Flow Matching. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 3203–3211. https://doi.org/10.1609/aaai.v39i3.32330
Issue
Section
AAAI Technical Track on Computer Vision II