SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding

Peng Ling; Tiao Tan; Jiaqi Lin; Wenming Yang

doi:10.1609/aaai.v39i5.32568

Authors

Peng Ling Shenzhen International Graduate School, Tsinghua University
Tiao Tan Shenzhen International Graduate School, Tsinghua University
Jiaqi Lin Shenzhen International Graduate School, Tsinghua University
Wenming Yang Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i5.32568

Abstract

Modeling 3D open-vocabulary language fields is challenging yet highly anticipated. Despite great progress, existing approaches heavily rely on a large number of training views to construct language-embedded 3D scenes, which is unfortunately impractical in real-world scenarios. This paper introduces SOVGaussian, the first method for few-shot novel view open-vocabulary language querying. We introduce a depth-constrained neural language field to mitigate the geometry degradation caused by overfitting training views. Rather than straightforwardly using dense depth maps for loosely accurate supervision, Language-Aware Depth Distillation (LAD) based on open-vocabulary object masks is proposed, ensuring intra-object geometric accuracy within the language field. To further refine the language-geometry consistency of the language field, we propose a novel Language-Guided Outlier Pruning (LOP) strategy, which identifies floating 3D Gaussian primitives overfitting training views based on their language-grouped densities. Our comprehensive experiments demonstrate that SOVGaussian is able to reconstruct a superior scene representation from few-shot images, outperforming existing state-of-the-art methods and achieving significantly better performance on novel view language querying and synthesis.

SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information