SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding

Authors

  • Peng Ling Shenzhen International Graduate School, Tsinghua University
  • Tiao Tan Shenzhen International Graduate School, Tsinghua University
  • Jiaqi Lin Shenzhen International Graduate School, Tsinghua University
  • Wenming Yang Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i5.32568

Abstract

Modeling 3D open-vocabulary language fields is challenging yet highly anticipated. Despite great progress, existing approaches heavily rely on a large number of training views to construct language-embedded 3D scenes, which is unfortunately impractical in real-world scenarios. This paper introduces SOVGaussian, the first method for few-shot novel view open-vocabulary language querying. We introduce a depth-constrained neural language field to mitigate the geometry degradation caused by overfitting training views. Rather than straightforwardly using dense depth maps for loosely accurate supervision, Language-Aware Depth Distillation (LAD) based on open-vocabulary object masks is proposed, ensuring intra-object geometric accuracy within the language field. To further refine the language-geometry consistency of the language field, we propose a novel Language-Guided Outlier Pruning (LOP) strategy, which identifies floating 3D Gaussian primitives overfitting training views based on their language-grouped densities. Our comprehensive experiments demonstrate that SOVGaussian is able to reconstruct a superior scene representation from few-shot images, outperforming existing state-of-the-art methods and achieving significantly better performance on novel view language querying and synthesis.

Downloads

Published

2025-04-11

How to Cite

Ling, P., Tan, T., Lin, J., & Yang, W. (2025). SOVGaussian: Sparse-View 3D Gaussian Splatting for Open-Vocabulary Scene Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 5343–5351. https://doi.org/10.1609/aaai.v39i5.32568

Issue

Section

AAAI Technical Track on Computer Vision IV