Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement
Recent years have witnessed remarkable success of deep learning methods in quality enhancement for compressed video. To better explore temporal information, existing methods usually estimate optical flow for temporal motion compensation. However, since compressed video could be seriously distorted by various compression artifacts, the estimated optical flow tends to be inaccurate and unreliable, thereby resulting in ineffective quality enhancement. In addition, optical flow estimation for consecutive frames is generally conducted in a pairwise manner, which is computational expensive and inefficient. In this paper, we propose a fast yet effective method for compressed video quality enhancement by incorporating a novel Spatio-Temporal Deformable Fusion (STDF) scheme to aggregate temporal information. Specifically, the proposed STDF takes a target frame along with its neighboring reference frames as input to jointly predict an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from both target and reference frames can be fused within a single Spatio-Temporal Deformable Convolution (STDC) operation. Extensive experiments show that our method achieves the state-of-the-art performance of compressed video quality enhancement in terms of both accuracy and efficiency.