Combinatorial CNN-Transformer Learning with Manifold Constraints for Semi-supervised Medical Image Segmentation


  • Huimin Huang Zhejiang University
  • Yawen Huang Jarvis Research Center, Tencent YouTu Lab
  • Shiao Xie Zhejiang University
  • Lanfen Lin Zhejiang University
  • Ruofeng Tong Zhejiang University Zhejiang Lab
  • Yen-Wei Chen Ritsumeikan University
  • Yuexiang Li Medical AI Research Group, Guangxi Medical University
  • Yefeng Zheng Jarvis Research Center, Tencent YouTu Lab



CV: Segmentation, ML: Semi-Supervised Learning


Semi-supervised learning (SSL), as one of the dominant methods, aims at leveraging the unlabeled data to deal with the annotation dilemma of supervised learning, which has attracted much attentions in the medical image segmentation. Most of the existing approaches leverage a unitary network by convolutional neural networks (CNNs) with compulsory consistency of the predictions through small perturbations applied to inputs or models. The penalties of such a learning paradigm are that (1) CNN-based models place severe limitations on global learning; (2) rich and diverse class-level distributions are inhibited. In this paper, we present a novel CNN-Transformer learning framework in the manifold space for semi-supervised medical image segmentation. First, at intra-student level, we propose a novel class-wise consistency loss to facilitate the learning of both discriminative and compact target feature representations. Then, at inter-student level, we align the CNN and Transformer features using a prototype-based optimal transport method. Extensive experiments show that our method outperforms previous state-of-the-art methods on three public medical image segmentation benchmarks.



How to Cite

Huang, H., Huang, Y., Xie, S., Lin, L., Tong, R., Chen, Y.-W., Li, Y., & Zheng, Y. (2024). Combinatorial CNN-Transformer Learning with Manifold Constraints for Semi-supervised Medical Image Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2330-2338.



AAAI Technical Track on Computer Vision II