DFMN: A Dual-feet Matching Network with Hybrid Transformer-based Feature Extractor for Unsupervised Deformable Medical Image Registration

Authors

  • Liwen Li Huazhong University of Science and Technology
  • Xinrui Guo Huazhong University of Science and Technology
  • Wentao Guo University of Illinois Urbana-Champaign
  • Shunqi Yang Huazhong University of Science and Technology
  • Fumin Guo Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i8.37559

Abstract

Deformable medical image registration is essential in medical image analyses. Recent transformer-based registration methods have achieved high registration accuracy. However, these methods often rely on patch embedding at the beginning of encoding, resulting in limited ability to capture detailed anatomical structural information in the images and explore local semantic relationships within individual patches. Here, we proposed a novel Dual-feet Encoder (DFEnc) to asynchronously model semantic information from moving and fixed images at various scales through two separate branches in three steps. For each step, features from adjacent resolution levels were processed by a Single Step Hybrid Extractor (SSHExt), which performed patch convolution to preserve local information, followed by several transformer blocks to capture global context. Dense connections were employed to enhance semantic awareness across adjacent feature resolution levels. Additionally, we introduced a Feature Fusion-based Decoder (FFDec) to progressively fuse features related to the fixed and moving images and to generate intermediate deformation fields at each stage, enabling accurate image alignment through stepwise warping and alignment refinement. Extensive ablation studies demonstrated the effectiveness of the proposed DFEnc, SSHExt, and FFDec. Compared to a state-of-the-art AutoFuse-Trans method, our approach yielded improvements in Dice of 1.14%, 1.77%, and 4.47% on the ACDC, OASIS, and Abdomen CT datasets, respectively, while maintaining relatively low computational cost. These results suggest the utility of the proposed approach for broad research and clinical applications.

Downloads

Published

2026-03-14

How to Cite

Li, L., Guo, X., Guo, W., Yang, S., & Guo, F. (2026). DFMN: A Dual-feet Matching Network with Hybrid Transformer-based Feature Extractor for Unsupervised Deformable Medical Image Registration. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6324–6332. https://doi.org/10.1609/aaai.v40i8.37559

Issue

Section

AAAI Technical Track on Computer Vision V