Unsupervised Representation for Semantic Segmentation by Implicit Cycle-Attention Contrastive Learning

Authors

  • Bo Pang Shanghai Jiao Tong University
  • Yizhuo Li Shanghai Jiao Tong University
  • Yifan Zhang Shanghai Jiao Tong University
  • Gao Peng Shanghai Jiao Tong University
  • Jiajun Tang Shanghai Jiao Tong University
  • Kaiwen Zha Massachusetts Institute of Technology
  • Jiefeng Li Shanghai Jiao Tong University
  • Cewu Lu Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v36i2.20100

Keywords:

Computer Vision (CV), Machine Learning (ML)

Abstract

We study the unsupervised representation learning for the semantic segmentation task. Different from previous works that aim at providing unsupervised pre-trained backbones for segmentation models which need further supervised fine-tune, here, we focus on providing representation that is only trained by unsupervised methods. This means models need to directly generate pixel-level, linearly separable semantic results. We first explore and present two factors that have significant effects on segmentation under the contrastive learning framework: 1) the difficulty and diversity of the positive contrastive pairs, 2) the balance of global and local features. With the intention of optimizing these factors, we propose the cycle-attention contrastive learning (CACL). CACL makes use of semantic continuity of video frames, adopting unsupervised cycle-consistent attention mechanism to implicitly conduct contrastive learning with difficult, global-local-balanced positive pixel pairs. Compared with baseline model MoCo-v2 and other unsupervised methods, CACL demonstrates consistently superior performance on PASCAL VOC (+4.5 mIoU) and Cityscapes (+4.5 mIoU) datasets.

Downloads

Published

2022-06-28

How to Cite

Pang, B., Li, Y., Zhang, Y., Peng, G., Tang, J., Zha, K., Li, J., & Lu, C. (2022). Unsupervised Representation for Semantic Segmentation by Implicit Cycle-Attention Contrastive Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(2), 2044-2052. https://doi.org/10.1609/aaai.v36i2.20100

Issue

Section

AAAI Technical Track on Computer Vision II