VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation

Authors

  • Chenyang Wu VCIP, CS, Nankai University
  • Jiayi Fu VCIP, CS, Nankai University
  • Chun-Le Guo VCIP, CS, Nankai University NKIARI, Shenzhen Futian
  • Shuhao Han VCIP, CS, Nankai University
  • Chongyi Li VCIP, CS, Nankai University NKIARI, Shenzhen Futian

DOI:

https://doi.org/10.1609/aaai.v40i13.38037

Abstract

Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first predict bidirectional flows at low resolution and then use high-magnification upsampling (e.g., bilinear) to obtain the high-resolution ones. However, this kind of upsampling strategy may cause blur or mosaic at the flows' edges. Additionally, the motion of fine pixels at high resolution cannot be adequately captured in motion estimation at low resolution, which leads to the misalignment of task-oriented flows. With such inaccurate flows, input frames are warped and combined pixel-by-pixel, resulting in ghosting and discontinuities in the interpolated frame. In this study, we propose a novel VFI pipeline, VTinker, which consists of two core components: guided flow upsampling (GFU) and Texture Mapping. After motion estimation at low resolution, GFU introduces input frames as guidance to alleviate the blurring details in bilinear upsampling flows, which makes flows' edges clearer. Subsequently, to avoid pixel-level ghosting and discontinuities, Texture Mapping generates an initial interpolated frame, referred to as the intermediate proxy. The proxy serves as a cue for selecting clear texture blocks from the input frames, which are then mapped onto the proxy to facilitate producing the final interpolated frame via a reconstruction module. Extensive experiments demonstrate that VTinker achieves state-of-the-art performance in VFI.

Downloads

Published

2026-03-14

How to Cite

Wu, C., Fu, J., Guo, C.-L., Han, S., & Li, C. (2026). VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(13), 10638-10645. https://doi.org/10.1609/aaai.v40i13.38037

Issue

Section

AAAI Technical Track on Computer Vision X