Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy
DOI:
https://doi.org/10.1609/aaai.v40i6.42421Abstract
Human motion synthesis in 3D scenes relies heavily on scene comprehension, while current methods focus mainly on scene structure but ignore the semantic understanding. In this paper, we propose a human motion synthesis framework that take an unified Scene Semantic Occupancy (SSO) for scene representation, termed SSOMotion. We design a bi-directional tri-plane decomposition to derive a compact version of the SSO, and scene semantics are mapped to an unified feature space via CLIP encoding and shared linear dimensionality reduction. Such strategy can derive the fine-grained scene semantic structures while significantly reduce redundant computations. We further take these scene hints and movement direction derived from instructions for motion control via frame-wise scene query. Extensive experiments and ablation studies conducted on cluttered scenes using ShapeNet furniture, as well as scanned scenes from PROX and Replica datasets, demonstrate its cutting-edge performance while validating its effectiveness and generalization ability.Downloads
Published
2026-03-14
How to Cite
Gong, J., Tong, K., Chen, Z., Yuan, C., Chen, M., Zhang, Z., … Xie, Y. (2026). Human Motion Synthesis in 3D Scenes via Unified Scene Semantic Occupancy. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4248–4256. https://doi.org/10.1609/aaai.v40i6.42421
Issue
Section
AAAI Technical Track on Computer Vision III