EC-MVSNet: Enhanced Cascaded Multi-View Stereo with Cross-Scale Relevance Integration
DOI:
https://doi.org/10.1609/aaai.v40i12.37972Abstract
Cascade-based multi-scale architectures are currently the mainstream in Multi-view Stereo (MVS), achieving a balance between computational efficiency and reconstruction accuracy. However, existing cascade MVS methods suffer from significant limitations in cross-scale information utilization, where depth estimation processes operate independently across scales without fully exploiting the rich relevance between adjacent scales. To address this fundamental limitation, we propose an Enhanced Cascade Multi-View Stereo framework (EC-MVSNet), which introduces a novel cross-scale relevance integration strategy. Specifically, we introduce a Cross-Scale Feature-based Joint Construction (CFC) module to synergistically combine features from adjacent scales to build more reliable cost volumes. Additionally, a Cross-Scale Probability-guided Enhancement (CPE) module is proposed to propagate depth probability distributions across scales to guide cost volume enhancement. Furthermore, we propose a Monocular Feature-based Refinement (MFR) module to further enhance depth prediction accuracy by leveraging monocular priors. Extensive experiments demonstrate that EC-MVSNet achieves state-of-the-art performance on multiple benchmarks, validating the effectiveness of the cross-scale integration in improving MVS reconstruction quality.Published
2026-03-14
How to Cite
Wang, S., Sun, J., Fan, B., Wang, Q., Lu, B., & Dai, Y. (2026). EC-MVSNet: Enhanced Cascaded Multi-View Stereo with Cross-Scale Relevance Integration. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10056–10064. https://doi.org/10.1609/aaai.v40i12.37972
Issue
Section
AAAI Technical Track on Computer Vision IX