Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition
DOI:
https://doi.org/10.1609/aaai.v37i1.25123Keywords:
CV: 3D Computer Vision, CV: Scene Analysis & UnderstandingAbstract
Monocular depth estimation is a challenging problem on which deep neural networks have demonstrated great potential. However, depth maps predicted by existing deep models usually lack fine-grained details due to convolution operations and down-samplings in networks. We find that increasing input resolution is helpful to preserve more local details while the estimation at low resolution is more accurate globally. Therefore, we propose a novel depth map fusion module to combine the advantages of estimations with multi-resolution inputs. Instead of merging the low- and high-resolution estimations equally, we adopt the core idea of Poisson fusion, trying to implant the gradient domain of high-resolution depth into the low-resolution depth. While classic Poisson fusion requires a fusion mask as supervision, we propose a self-supervised framework based on guided image filtering. We demonstrate that this gradient-based composition performs much better at noisy immunity, compared with the state-of-the-art depth map fusion method. Our lightweight depth fusion is one-shot and runs in real-time, making it 80X faster than a state-of-the-art depth fusion method. Quantitative evaluations demonstrate that the proposed method can be integrated into many fully convolutional monocular depth estimation backbones with a significant performance boost, leading to state-of-the-art results of detail enhancement on depth maps. Codes are released at https://github.com/yuinsky/gradient-based-depth-map-fusion.Downloads
Published
2023-06-26
How to Cite
Dai, Y., Yi, R., Zhu, C., He, H., & Xu, K. (2023). Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 488-496. https://doi.org/10.1609/aaai.v37i1.25123
Issue
Section
AAAI Technical Track on Computer Vision I