Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer

Authors

  • Haopeng Sun Beijing Key Lab. of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Yingwei Zhang Beijing Key Lab. of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Lumin Xu The Chinese University of Hong Kong
  • Sheng Jin The University of Hong Kong SenseTime Research and Tetras.AI
  • Yiqiang Chen Beijing Key Lab. of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences University of Chinese Academy of Sciences Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v39i7.32761

Abstract

Segmentation of ultra-high resolution (UHR) images is a critical task with numerous applications, yet it poses significant challenges due to high spatial resolution and rich fine details. Recent approaches adopt a dual-branch architecture, where a global branch learns long-range contextual information and a local branch captures fine details. However, they struggle to handle the conflict between global and local information while adding significant extra computational cost. Inspired by the human visual system's ability to rapidly orient attention to important areas with fine details and filter out irrelevant information, we propose a novel UHR segmentation method called Boundary-enhanced Patch-merging Transformer (BPT). BPT consists of two key components: (1) Patch-Merging Transformer (PMT) for dynamically allocating tokens to informative regions to acquire global and local representations, and (2) Boundary-Enhanced Module (BEM) that leverages boundary information to enrich fine details. Extensive experiments on multiple UHR image segmentation benchmarks demonstrate that our BPT outperforms previous state-of-the-art methods without introducing extra computational overhead.

Downloads

Published

2025-04-11

How to Cite

Sun, H., Zhang, Y., Xu, L., Jin, S., & Chen, Y. (2025). Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7087–7095. https://doi.org/10.1609/aaai.v39i7.32761

Issue

Section

AAAI Technical Track on Computer Vision VI