Oscillation Inversion: Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models

Yan Zheng; Zhenxiao Liang; Xiaoyan Cong; Yi Yang; Lanqing Guo; Yuehao Wang; Peihao Wang; Zhangyang Wang

doi:10.1609/aaai.v40i16.38352

Authors

Yan Zheng University of Texas at Austin
Zhenxiao Liang University of Texas at Austin
Xiaoyan Cong Brown University
Yi Yang The University of Edinburgh
Lanqing Guo University of Texas at Austin
Yuehao Wang University of Texas at Austin
Peihao Wang University of Texas at Austin
Zhangyang Wang University of Texas at Austin

DOI:

https://doi.org/10.1609/aaai.v40i16.38352

Abstract

We explore the oscillatory behavior observed in inversion methods applied to large-scale flow models, including text-to-image and text-to-video. By employing an augmented fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both experiments on synthetic data, text-to-image and text-to-video, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates training-free image and video editing/enhancement. Furthermore, we provide quantitative results demonstrating the effectiveness of our method on tasks such as image enhancement, editing, and reconstruction. Notably, our approach enables the transformation of image-only enhancers and editors into lightweight, video-capable tools—without additional training—highlighting its practical versatility and impact.

Oscillation Inversion: Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information