TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Chuanrui Zhang; Yingshuang Zou; Zhuoling Li; Minmin Yi; Haoqian Wang

doi:10.1609/aaai.v39i9.33070

Authors

Chuanrui Zhang Tsinghua University
Yingshuang Zou Tsinghua University
Zhuoling Li University of Hong Kong
Minmin Yi E-surfing Vision Technology Co., Ltd
Haoqian Wang Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i9.33070

Abstract

Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlapping areas between various views and contain numerous similar regions, the matching performance of existing methods is poor and the reconstruction precision is limited. To address this problem, we develop a strategy that utilizes a predicted depth confidence map to guide accurate local feature matching. In addition, we propose to utilize the knowledge of existing monocular depth estimation models as prior to boost the depth estimation precision in non-overlapping areas between views. Combining the proposed strategies, we present a novel G-3DGS method named TranSplat, which obtains the best performance on both the RealEstate10K and ACID benchmarks while maintaining competitive speed and presenting strong cross-dataset generalization ability.

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information