MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

Authors

  • Zhifei Yang School of Computer Science, Peking University
  • Keyang Lu School of Artificial Intelligence, Beihang University
  • Chao Zhang Beijing Digital Native Digital City Research Center
  • Jiaxing Qi School of Computer Science and Engineering, Beihang University
  • Hanqi Jiang Beijing Digital Native Digital City Research Center
  • Ruifei Ma Beijing Digital Native Digital City Research Center
  • Shenglin Yin School of Computer Science, Peking University
  • Yifan Xu School of Computer Science and Engineering, Beihang University
  • Mingzhe Xing School of Computer Science, Peking University
  • Zhen Xiao School of Computer Science, Peking University
  • Jieyi Long Theta Labs, Inc.
  • Xiangde Liu Beijing Digital Native Digital City Research Center
  • Guangyao Zhai Technical University of Munich

DOI:

https://doi.org/10.1609/aaai.v39i9.33017

Abstract

Controllable 3D scene generation has extensive applications in virtual reality and interior design, where the generated scenes should exhibit high levels of realism and controllability in terms of geometry. Scene graphs provide a suitable data representation that facilitates these applications. However, current graph-based methods for scene generation are constrained to text-based inputs and exhibit insufficient adaptability to flexible user inputs, hindering the ability to precisely control object geometry. To address this issue, we propose MMGDreamer, a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor. The mixed-modality graph allows object nodes to integrate textual and visual modalities, with optional relationships between nodes. It enhances adaptability to flexible user inputs and enables meticulous control over the geometry of objects in the generated scenes. The visual enhancement module enriches the visual fidelity of text-only nodes by constructing visual representations using text embeddings. Furthermore, our relation predictor leverages node representations to infer absent relationships between nodes, resulting in more coherent scene layouts. Extensive experimental results demonstrate that MMGDreamer exhibits superior control of object geometry, achieving state-of-the-art scene generation performance.

Downloads

Published

2025-04-11

How to Cite

Yang, Z., Lu, K., Zhang, C., Qi, J., Jiang, H., Ma, R., … Zhai, G. (2025). MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(9), 9391–9399. https://doi.org/10.1609/aaai.v39i9.33017

Issue

Section

AAAI Technical Track on Computer Vision VIII