WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting

Authors

  • Zhiliang Wu CCAI, Zhejiang University, China
  • Changchang Sun Department of Computer Science, Illinois Institute of Technology, USA
  • Hanyu Xuan School of Big Data and Statistics, Anhui University, China
  • Gaowen Liu Cisco Research, USA
  • Yan Yan Department of Computer Science, Illinois Institute of Technology, USA

DOI:

https://doi.org/10.1609/aaai.v38i6.28435

Keywords:

CV: Low Level & Physics-based Vision

Abstract

Video inpainting aims to fill in the missing regions of the video frames with plausible content. Benefiting from the outstanding long-range modeling capacity, the transformer-based models have achieved unprecedented performance regarding inpainting quality. Essentially, coherent contents from all the frames along both spatial and temporal dimensions are concerned by a patch-wise attention module, and then the missing contents are generated based on the attention-weighted summation. In this way, attention retrieval accuracy has become the main bottleneck to improve the video inpainting performance, where the factors affecting attention calculation should be explored to maximize the advantages of transformer. Towards this end, in this paper, we theoretically certificate that noise is the culprit that entangles the process of attention calculation. Meanwhile, we propose a novel wavelet transformer network with noise robustness for video inpainting, named WaveFormer. Unlike existing transformer-based methods that utilize the whole embeddings to calculate the attention, our WaveFormer first separates the noise existing in the embedding into high-frequency components by introducing the Discrete Wavelet Transform (DWT), and then adopts clean low-frequency components to calculate the attention. In this way, the impact of noise on attention computation can be greatly mitigated and the missing content regarding different frequencies can be generated by sharing the calculated attention. Extensive experiments validate the superior performance of our method over state-of-the-art baselines both qualitatively and quantitatively.

Published

2024-03-24

How to Cite

Wu, Z., Sun, C., Xuan, H., Liu, G., & Yan, Y. (2024). WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6180–6188. https://doi.org/10.1609/aaai.v38i6.28435

Issue

Section

AAAI Technical Track on Computer Vision V