Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter

Authors

  • Zhiyang Chen School of Control Science and Engineering, Shandong University, China Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250061, China
  • Chen Zhang School of Control Science and Engineering, Shandong University, China Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250061, China
  • Hao Fang School of Control Science and Engineering, Shandong University, China Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250061, China
  • Runmin Cong School of Control Science and Engineering, Shandong University, China Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250061, China

DOI:

https://doi.org/10.1609/aaai.v40i5.37314

Abstract

Underwater Instance Segmentation (UIS), integrating pixel-level understanding and instance-level discrimination, is a pivotal technology in marine resource exploration and ecological protection. In recent years, large-scale pretrained visual foundation models, exemplified by DINO, have advanced rapidly and demonstrated remarkable performance on complex downstream tasks. In this paper, we demonstrate that DINO can serve as an effective feature learner for UIS, and we introduce DiveSeg, a novel framework built upon two insightful components: (1) The AquaStyle Aligner, designed to embed underwater color style features into the DINO fine-tuning process, facilitating better adaptation to the underwater domain. (2) The ObjectPrior Prompter, which incorporates binary segmentation-based prompts to deliver object-level priors, provides essential guidance for instance segmentation task that requires both object- and instance-level reasoning. We conduct thorough experiments on the popular UIIS and USIS10K datasets, and the results show that DiveSeg achieves the state-of-the-art performance.

Published

2026-03-14

How to Cite

Chen, Z., Zhang, C., Fang, H., & Cong, R. (2026). Empowering DINO Representations for Underwater Instance Segmentation via Aligner and Prompter. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3201-3209. https://doi.org/10.1609/aaai.v40i5.37314

Issue

Section

AAAI Technical Track on Computer Vision II