SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

Authors

  • Tao Wu College of Computer Science and Technology, Zhejiang University
  • Xuewei Li College of Computer Science and Technology, Zhejiang University
  • Zhongang Qi ARC Lab, Tencent PCG
  • Di Hu Gaoling School of Artificial Intelligence, Renmin University of China
  • Xintao Wang ARC Lab, Tencent PCG
  • Ying Shan ARC Lab, Tencent PCG
  • Xi Li College of Computer Science and Technology, Zhejiang University Zhejiang – Singapore Innovation and AI Joint Research Lab, Hangzhou

DOI:

https://doi.org/10.1609/aaai.v38i6.28429

Keywords:

CV: Applications, CV: Computational Photography, Image & Video Synthesis

Abstract

Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains. However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation. In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images. For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images. Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion. For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic. Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images. With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.

Published

2024-03-24

How to Cite

Wu, T., Li, X., Qi, Z., Hu, D., Wang, X., Shan, Y., & Li, X. (2024). SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6126-6134. https://doi.org/10.1609/aaai.v38i6.28429

Issue

Section

AAAI Technical Track on Computer Vision V