G-IR: Geometric Image Representation for Learning
DOI:
https://doi.org/10.1609/aaai.v40i24.39119Abstract
Images are generally represented by pixel intensities or color values, which are usually used as direct inputs for learning. This study innovatively proposes a geometric image representation method and refreshes the general learning model (e.g., autoencoder) in the diffeomorphic space. Based on the theory of geometric optimal transport and quasiconformal mapping, we equivalently transform the intensity representation into a shape representation. The image space becomes a diffeomorphic space, where any image can be uniquely represented as a Beltrami coefficient function defined on a uniform grid reference, and vice versa. This innovative geometric image representation (G-IR) captures the fine-grained structure inherent in the entire image, which is different from the traditional feature extraction that focuses on the internal geometric objects of the image (such as boundaries and axes). The diffeomorphic property preserves structure in the generation process, which is very necessary in the field of real physics. It can be assembled into existing pipelines as a plug-in, providing structure-preserving properties for the entire framework. Experiments on image restoration and interpolation validated the high efficiency, efficacy and applicability of the G-IR method, demonstrating its superior performance compared to common pixel-level image appearance representations.Downloads
Published
2026-03-14
How to Cite
Chen, X., Zhao, Q., Zeng, W., & Xu, Z. (2026). G-IR: Geometric Image Representation for Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 20307-20315. https://doi.org/10.1609/aaai.v40i24.39119
Issue
Section
AAAI Technical Track on Machine Learning I