Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution

Qi Tang; Yao Zhao; Meiqin Liu; Jian Jin; Chao Yao

doi:10.1609/aaai.v38i6.28321

Authors

Qi Tang Institute of Information Science, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
Yao Zhao Institute of Information Science, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
Meiqin Liu Institute of Information Science, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
Jian Jin Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore
Chao Yao School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28321

Keywords:

CV: Low Level & Physics-based Vision, CV: Segmentation

Abstract

As a critical clue of video super-resolution (VSR), inter-frame alignment significantly impacts overall performance. However, accurate pixel-level alignment is a challenging task due to the intricate motion interweaving in the video. In response to this issue, we introduce a novel paradigm for VSR named Semantic Lens, predicated on semantic priors drawn from degraded videos. Specifically, video is modeled as instances, events, and scenes via a Semantic Extractor. Those semantics assist the Pixel Enhancer in understanding the recovered contents and generating more realistic visual results. The distilled global semantics embody the scene information of each frame, while the instance-specific semantics assemble the spatial-temporal contexts related to each instance. Furthermore, we devise a Semantics-Powered Attention Cross-Embedding (SPACE) block to bridge the pixel-level features with semantic knowledge, composed of a Global Perspective Shifter (GPS) and an Instance-Specific Semantic Embedding Encoder (ISEE). Concretely, the GPS module generates pairs of affine transformation parameters for pixel-level feature modulation conditioned on global semantics. After that the ISEE module harnesses the attention mechanism to align the adjacent frames in the instance-centric semantic space. In addition, we incorporate a simple yet effective pre-alignment module to alleviate the difficulty of model training. Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods.

Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription