Pay Attention to Target: Relation-Aware Temporal Consistency for Domain Adaptive Video Semantic Segmentation

Authors

  • Huayu Mai Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China
  • Rui Sun Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China
  • Yuan Wang Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China
  • Tianzhu Zhang Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Feng Wu Deep Space Exploration Laboratory/School of Information Science and Technology, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v38i5.28211

Keywords:

CV: Segmentation

Abstract

Video semantic segmentation has achieved conspicuous achievements attributed to the development of deep learning, but suffers from labor-intensive annotated training data gathering. To alleviate the data-hunger issue, domain adaptation approaches are developed in the hope of adapting the model trained on the labeled synthetic videos to the real videos in the absence of annotations. By analyzing the dominant paradigm consistency regularization in the domain adaptation task, we find that the bottlenecks exist in previous methods from the perspective of pseudo-labels. To take full advantage of the information contained in the pseudo-labels and empower more effective supervision signals, we propose a coherent PAT network including a target domain focalizer and relation-aware temporal consistency. The proposed PAT network enjoys several merits. First, the target domain focalizer is responsible for paying attention to the target domain, and increasing the accessibility of pseudo-labels in consistency training. Second, the relation-aware temporal consistency aims at modeling the inter-class consistent relationship across frames to equip the model with effective supervision signals. Extensive experimental results on two challenging benchmarks demonstrate that our method performs favorably against state-of-the-art domain adaptive video semantic segmentation methods.

Downloads

Published

2024-03-24

How to Cite

Mai, H., Sun, R., Wang, Y., Zhang, T., & Wu, F. (2024). Pay Attention to Target: Relation-Aware Temporal Consistency for Domain Adaptive Video Semantic Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4162-4170. https://doi.org/10.1609/aaai.v38i5.28211

Issue

Section

AAAI Technical Track on Computer Vision IV