SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

Tao Wu; Xuewei Li; Zhongang Qi; Di Hu; Xintao Wang; Ying Shan; Xi Li

doi:10.1609/aaai.v38i6.28429

Authors

Tao Wu College of Computer Science and Technology, Zhejiang University
Xuewei Li College of Computer Science and Technology, Zhejiang University
Zhongang Qi ARC Lab, Tencent PCG
Di Hu Gaoling School of Artificial Intelligence, Renmin University of China
Xintao Wang ARC Lab, Tencent PCG
Ying Shan ARC Lab, Tencent PCG
Xi Li College of Computer Science and Technology, Zhejiang University Zhejiang – Singapore Innovation and AI Joint Research Lab, Hangzhou

DOI:

https://doi.org/10.1609/aaai.v38i6.28429

Keywords:

CV: Applications, CV: Computational Photography, Image & Video Synthesis

Abstract

Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains. However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation. In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images. For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images. Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion. For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic. Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images. With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information