ID-Splat: Propagating Object Identities for Segmenting 3D Aerial-view Scenes

Authors

  • Yijing Wang Xidian University
  • Xu Tang Xidian University
  • Xiangrong Zhang Xidian University
  • Jingjing Ma Xidian University

DOI:

https://doi.org/10.1609/aaai.v40i12.37995

Abstract

High-resolution Earth Observation technologies present unprecedented opportunities for geospatial analysis, yet traditional 2D aerial-view semantic segmentation remains limited by its inability to model spatial relationships and handle object occlusions. While 3D Aerial-view Segmentation (3DAS) has emerged to address these limitations, existing methods predominantly rely on 2D discriminative models pre-trained on natural scenes. These models struggle to accurately recognize aerial-view imagery, resulting in suboptimal performance due to significant domain discrepancies. This paper introduces ID-Splat, a novel object-centric framework that directly leverages multi-view object identities without discriminative information to enhance 3D semantic understanding. ID-Splat implements a two-stage process: first, Mask-object Tracking combines SAM and Point Tracking to establish robust and consistent object identities across multi-view aerial images; second, Object Integration & Propagation assigns these identities to 3D Gaussian Splatting (3DGS) points, enabling complete 3D segmentation through semantic propagation. Experimental results on the 3D-AS dataset demonstrate that ID-Splat significantly outperforms existing methods, particularly under sparse supervision conditions. ID-Splat also achieves state-of-the-art performance while reducing the need for extensive labeled data by effectively leveraging the inherent 3D structure.

Published

2026-03-14

How to Cite

Wang, Y., Tang, X., Zhang, X., & Ma, J. (2026). ID-Splat: Propagating Object Identities for Segmenting 3D Aerial-view Scenes. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10261-10269. https://doi.org/10.1609/aaai.v40i12.37995

Issue

Section

AAAI Technical Track on Computer Vision IX