Alignment-Free RGB-T Salient Object Detection: A Large-Scale Dataset and Progressive Correlation Network

Authors

  • Kunpeng Wang Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, China
  • Keke Chen Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, China
  • Chenglong Li Anhui Provincial Key Laboratory of Security Artificial Intelligence, School of Artificial Intelligence, Anhui University, China
  • Zhengzheng Tu Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, China
  • Bin Luo Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Computer Science and Technology, Anhui University, China

DOI:

https://doi.org/10.1609/aaai.v39i7.32838

Abstract

Alignment-free RGB-Thermal (RGB-T) salient object detection (SOD) aims to achieve robust performance in complex scenes by directly leveraging the complementary information from unaligned visible-thermal image pairs, without requiring manual alignment. However, the labor-intensive process of collecting and annotating image pairs limits the scale of existing benchmarks, hindering the advancement of alignment-free RGB-T SOD. In this paper, we construct a large-scale and high-diversity unaligned RGB-T SOD dataset named UVT20K, comprising 20,000 image pairs, 407 scenes, and 1256 object categories. All samples are collected from real-world scenarios with various challenges, such as low illumination, image clutter, complex salient objects, and so on. To support the exploration for further research, each sample in UVT20K is annotated with a comprehensive set of ground truths, including saliency masks, scribbles, boundaries, and challenge attributes. In addition, we propose a Progressive Correlation Network (PCNet), which models inter- and intra-modal correlations on the basis of explicit alignment to achieve accurate predictions in unaligned image pairs. Extensive experiments conducted on two unaligned three weakly aligned three aligned datasets demonstrate the effectiveness of our method.

Downloads

Published

2025-04-11

How to Cite

Wang, K., Chen, K., Li, C., Tu, Z., & Luo, B. (2025). Alignment-Free RGB-T Salient Object Detection: A Large-Scale Dataset and Progressive Correlation Network. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7780–7788. https://doi.org/10.1609/aaai.v39i7.32838

Issue

Section

AAAI Technical Track on Computer Vision VI