SeViL: Semi-supervised Vision-Language Learning with Text Prompt Guiding for Moving Infrared Small Target Detection

Authors

  • Weiwei Duan University of Electronic Science and Technology of China
  • Luping Ji University of Electronic Science and Technology of China
  • Jianghong Huang University of Electronic Science and Technology of China
  • Sicheng Zhu University of Electronic Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i5.37372

Abstract

Unlike traditional object detection, moving infrared small target detection is highly challenging due to tiny target size and limited labeled samples. Currently, most existing methods mainly focus on the pure-vision features usually by fully-supervised learning, heavily relying on extensive high-cost manual annotations. Moreover, they almost have not concerned the potentials of multi-modal (e.g., vision and text) learning yet. To address these issues, inspired by prevalent vision-language models, we propose the first semi-supervised vision-language (SeViL) framework with adaptive text prompt guiding. Breaking through traditional pure-vision modality, it takes text prompts as prior knowledge to adaptively enhance target regions and then filter the low-quality pseudo-labels generated on unlabeled data. In the meanwhile, we employ an adaptive cross-modal masking strategy to align text and vision features, promoting cross-modal deep interactions. Remarkably, our extensive experiments on three public datasets (DAUB, ITSDT-15K and IRDST) verify that our new scheme could outperform other semi-supervised ones, and even achieve comparable performance to fully-supervised state-of-the-art (SOTA) methods, with only 10% labeled training samples.

Downloads

Published

2026-03-14

How to Cite

Duan, W., Ji, L., Huang, J., & Zhu, S. (2026). SeViL: Semi-supervised Vision-Language Learning with Text Prompt Guiding for Moving Infrared Small Target Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3723-3731. https://doi.org/10.1609/aaai.v40i5.37372

Issue

Section

AAAI Technical Track on Computer Vision II