ClassFormer: Exploring Class-Aware Dependency with Transformer for Medical Image Segmentation


  • Huimin Huang Zhejiang University
  • Shiao Xie Zhejiang University
  • Lanfen Lin Zhejiang University
  • Ruofeng Tong Zhejiang University Zhejiang Lab
  • Yen-Wei Chen Ritsumeikan University
  • Hong Wang Tencent Jarvis Lab
  • Yuexiang Li Tencent Jarvis Lab
  • Yawen Huang Tencent Jarvis Lab
  • Yefeng Zheng Tencent Jarvis Lab



CV: Segmentation, CV: Medical and Biological Imaging


Vision Transformers have recently shown impressive performances on medical image segmentation. Despite their strong capability of modeling long-range dependencies, the current methods still give rise to two main concerns in a class-level perspective: (1) intra-class problem: the existing methods lacked in extracting class-specific correspondences of different pixels, which may lead to poor object coverage and/or boundary prediction; (2) inter-class problem: the existing methods failed to model explicit category-dependencies among various objects, which may result in inaccurate localization. In light of these two issues, we propose a novel transformer, called ClassFormer, powered by two appealing transformers, i.e., intra-class dynamic transformer and inter-class interactive transformer, to address the challenge of fully exploration on compactness and discrepancy. Technically, the intra-class dynamic transformer is first designed to decouple representations of different categories with an adaptive selection mechanism for compact learning, which optimally highlights the informative features to reflect the salient keys/values from multiple scales. We further introduce the inter-class interactive transformer to capture the category dependency among different objects, and model class tokens as the representative class centers to guide a global semantic reasoning. As a consequence, the feature consistency is ensured with the expense of intra-class penalization, while inter-class constraint strengthens the feature discriminability between different categories. Extensive empirical evidence shows that ClassFormer can be easily plugged into any architecture, and yields improvements over the state-of-the-art methods in three public benchmarks.




How to Cite

Huang, H., Xie, S., Lin, L., Tong, R., Chen, Y.-W., Wang, H., Li, Y., Huang, Y., & Zheng, Y. (2023). ClassFormer: Exploring Class-Aware Dependency with Transformer for Medical Image Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 917-925.



AAAI Technical Track on Computer Vision I