Orthogonal Spatial-temporal Distributional Transfer for 4D Generation

Authors

  • Wei Liu School of Management Science and Engineering, Anhui University of Finance and Economics, Bengbu, China
  • Shengqiong Wu National University of Singapore
  • Bobo Li National University of Singapore
  • Haoyu Zhao Wuhan University
  • Hao Fei National University of Singapore
  • Mong-Li Lee National University of Singapore
  • Wynne Hsu National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v40i9.37666

Abstract

In the AIGC era, generating high-quality 4D content has garnered increasing research attention. Unfortunately, current 4D synthesis research is severely constrained by the lack of large-scale 4D datasets, preventing models from adequately learning the critical spatial-temporal features necessary for high-quality 4D generation, thus hindering progress in this domain. To combat this, we propose a novel framework that transfers rich spatial priors from existing 3D diffusion models and temporal priors from video diffusion models to enhance 4D synthesis. We develop a spatial-temporal-disentangled 4D (STD-4D) Diffusion model, which synthesizes 4D-aware videos through disentangled spatial and temporal latents. To facilitate the best feature transfer, we design a novel Orthogonal Spatial-temporal Distributional Transfer (Orster) mechanism, where the spatiotemporal feature distributions are carefully modeled and injected into the STD-4D Diffusion. Further, during the 4D construction, we devise a spatial-temporal-aware HexPlane (ST-HexPlane) to integrate the transferred spatiotemporal features for better 4D deformation and 4D Gaussian feature modeling. Experiments demonstrate that our method significantly outperforms existing approaches, achieving superior spatial-temporal consistency and higher-quality 4D synthesis.

Published

2026-03-14

How to Cite

Liu, W., Wu, S., Li, B., Zhao, H., Fei, H., Lee, M.-L., & Hsu, W. (2026). Orthogonal Spatial-temporal Distributional Transfer for 4D Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 7287–7295. https://doi.org/10.1609/aaai.v40i9.37666

Issue

Section

AAAI Technical Track on Computer Vision VI