Zero-to-Hero: Empowering Video Appearance Transfer with Zero-Shot Initialization and Holistic Restoration

Authors

  • Tongtong Su Zhejiang University Alibaba Cloud Computing
  • Chengyu Wang Alibaba Cloud Computing
  • Haipeng Liao NingboTech University
  • Jun Huang Alibaba Cloud Computing
  • Dongming Lu Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i11.37872

Abstract

Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named Zero-to-Hero, which focuses on reference-based video editing by disentangling the editing process into two distinct problems. It achieves this by first editing an anchor frame to satisfy user requirements as a reference image and then consistently propagating its appearance across the other frames in the video. To achieve accurate appearance propagation, in the first stage of Zero-to-Hero, we leverage correspondences within the original frames to guide the attention mechanism, which is more robust than previously proposed optical flow or temporal modules in memory-friendly video generative models, especially when dealing with objects exhibiting large motions. This offers a solid zero-shot initialization that ensures both accuracy and temporal consistency. However, intervention in the attention mechanism results in compounded imaging degradation with unknown blurring and color-missing issues. Following the Zero-Stage, our Hero-Stage holistically learns a conditional generative model for video restoration. To accurately evaluate appearance consistency, we construct a set of videos with multiple appearances using Blender, enabling a fine-grained and deterministic evaluation. Our method outperforms the best-performing baseline with a PSNR improvement of 2.6 dB.

Downloads

Published

2026-03-14

How to Cite

Su, T., Wang, C., Liao, H., Huang, J., & Lu, D. (2026). Zero-to-Hero: Empowering Video Appearance Transfer with Zero-Shot Initialization and Holistic Restoration. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9153–9161. https://doi.org/10.1609/aaai.v40i11.37872

Issue

Section

AAAI Technical Track on Computer Vision VIII