UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery Using Gaussian Splatting
DOI:
https://doi.org/10.1609/aaai.v40i5.37333Abstract
Despite significant advancements in dynamic neural rendering, existing methods fail to address the unique challenges posed by UAV-captured scenarios, particularly those involving monocular camera setups, top-down perspective, and multiple small, moving humans, which are not adequately represented in existing datasets. In this work, we introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs. Specifically, we address the challenge of reconstructing dynamic scenes with multiple moving pedestrians from monocular video data without the need for additional sensors. We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans. We propose a novel approach to resolve the scene scale ambiguity and place both humans and the scene in world coordinates by identifying human-scene contact points. Additionally, we exploit the SMPL model and background mesh to initialize Gaussian splats, enabling holistic scene rendering. We evaluated our method on three complex UAV-captured datasets: VisDrone, Manipal-UAV, and Okutama-Action, each with distinct characteristics and 10-50 humans. Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.Downloads
Published
2026-03-14
How to Cite
Choi, J., Jung, D., Maxey, C., Eum, S., Lee, Y., Manocha, D., & Kwon, H. (2026). UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery Using Gaussian Splatting. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3372–3380. https://doi.org/10.1609/aaai.v40i5.37333
Issue
Section
AAAI Technical Track on Computer Vision II