SceneGenesis: 3D Scene Synthesis via Semantic Structural Priors and Mesh-Guided Video-Geometry Fusion
DOI:
https://doi.org/10.1609/aaai.v40i16.38333Abstract
Generating high-quality, controllable, and structurally consistent 3D scenes in complex multi-object environments remains a fundamental challenge. We present SceneGenesis, a unified framework that synthesizes 3D scenes by combining semantic structural priors with mesh-guided video–geometry fusion. SceneGenesis first employs large language models to convert textual descriptions into category-aware object specifications, which are transformed into structured meshes using procedural approximations and pretrained asset generators, enabling precise layout control and scalable scene construction. To obtain rich and style-controllable appearances, SceneGenesis generates multi-view video representations conditioned on the initialized structure. A mesh-guided video–geometry fusion module then consolidates video evidence with mesh priors through mesh-conditioned fragment initialization, progressive geometric refinement, and structure-aware optimization, substantially improving global geometric fidelity and visual realism. Experiments demonstrate that SceneGenesis supports flexible style variation and object-level editing while achieving strong controllability, scalability, and structural quality.Published
2026-03-14
How to Cite
Zhao, Y., Yang, H., & Huang, D. (2026). SceneGenesis: 3D Scene Synthesis via Semantic Structural Priors and Mesh-Guided Video-Geometry Fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13305–13313. https://doi.org/10.1609/aaai.v40i16.38333
Issue
Section
AAAI Technical Track on Computer Vision XIII