Omnidirectional Image Super-resolution via Bi-projection Fusion

Authors

  • Jiangang Wang Shenzhen Campus of Sun Yat-sen University
  • Yuning Cui Technical University of Munich
  • Yawen Li Beijing University of Posts and Telecommunications
  • Wenqi Ren Shenzhen Campus of Sun Yat-sen University
  • Xiaochun Cao Shenzhen Campus of Sun Yat-sen University

DOI:

https://doi.org/10.1609/aaai.v38i6.28354

Keywords:

CV: Low Level & Physics-based Vision, CV: Applications

Abstract

With the rapid development of virtual reality, omnidirectional images (ODIs) have attracted much attention from both the industrial community and academia. However, due to storage and transmission limitations, the resolution of current ODIs is often insufficient to provide an immersive virtual reality experience. Previous approaches address this issue using conventional 2D super-resolution techniques on equirectangular projection without exploiting the unique geometric properties of ODIs. In particular, the equirectangular projection (ERP) provides a complete field-of-view but introduces significant distortion, while the cubemap projection (CMP) can reduce distortion yet has a limited field-of-view. In this paper, we present a novel Bi-Projection Omnidirectional Image Super-Resolution (BPOSR) network to take advantage of the geometric properties of the above two projections. Then, we design two tailored attention methods for these projections: Horizontal Striped Transformer Block (HSTB) for ERP and Perspective Shift Transformer Block (PSTB) for CMP. Furthermore, we propose a fusion module to make these projections complement each other. Extensive experiments demonstrate that BPOSR achieves state-of-the-art performance on omnidirectional image super-resolution. The code is available at https://github.com/W-JG/BPOSR.

Published

2024-03-24

How to Cite

Wang , J., Cui, Y., Li, Y., Ren, W., & Cao, X. (2024). Omnidirectional Image Super-resolution via Bi-projection Fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5454-5462. https://doi.org/10.1609/aaai.v38i6.28354

Issue

Section

AAAI Technical Track on Computer Vision V