SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization

Authors

  • Zhenlong Yuan Institute of Computing Technology, Chinese Academy of Sciences
  • Jiakai Cao Institute of Computing Technology, Chinese Academy of Sciences
  • Zhaoxin Li Agricultural Information Institute, Chinese Academy of Agricultural Sciences Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs
  • Hao Jiang Institute of Computing Technology, Chinese Academy of Sciences
  • Zhaoqi Wang Institute of Computing Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i7.28512

Keywords:

CV: 3D Computer Vision, CV: Segmentation

Abstract

In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.

Published

2024-03-24

How to Cite

Yuan, Z., Cao, J., Li, Z., Jiang, H., & Wang, Z. (2024). SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 6871-6880. https://doi.org/10.1609/aaai.v38i7.28512

Issue

Section

AAAI Technical Track on Computer Vision VI