DDViT: Double-Level Fusion Domain Adapter Vision Transformer (Student Abstract)

Linpeng Sun; Victor S. Sheng

doi:10.1609/aaai.v38i21.30516

Authors

Linpeng Sun Texas Tech University
Victor S. Sheng Texas Tech University

DOI:

https://doi.org/10.1609/aaai.v38i21.30516

Keywords:

Computer Vision, Machine Learning, Applications Of AI

Abstract

With the help of Vision transformers (ViTs), medical image segmentation was able to achieve outstanding performance. In particular, they overcome the limitation of convolutional neural networks (CNNs) which rely on local receptive fields. ViTs use self-attention mechanisms to consider relationships between all image pixels or patches simultaneously. However, they require large datasets for training and did not perform well on capturing low-level features. To that end, we propose DDViT, a novel ViT model that unites a CNN to alleviate data-hunger for medical image segmentation with two multi-scale feature representations. Significantly, our approach incorporates a ViT with a plug-in domain adapter (DA) with Double-Level Fusion (DLF) technique, complemented by a mutual knowledge distillation paradigm, facilitating the seamless exchange of knowledge between a universal network and specialized domain-specific network branches. The DLF framework plays a pivotal role in our encoder-decoder architecture, combining the innovation of the TransFuse module with a robust CNN-based encoder. Extensive experimentation across diverse medical image segmentation datasets underscores the remarkable efficacy of DDViT when compared to alternative approaches based on CNNs and Transformer-based models.

DDViT: Double-Level Fusion Domain Adapter Vision Transformer (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription