DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning

Authors

  • Bo Han School of Computer Science and Engineering, Southeast University, Nanjing, China
  • Zhuoming Li School of Computer Science and Engineering, Southeast University, Nanjing, China
  • Xiaoyu Wang School of Software Engineering, Southeast University, Nanjing, China
  • Yaxin Hou School of Computer Science and Engineering, Southeast University, Nanjing, China
  • Hui Liu School of Computing Information Sciences, Saint Francis University, Hong Kong, China
  • Junhui Hou Department of Computer Science, City University of Hong Kong, Hong Kong, China
  • Yuheng Jia School of Computer Science and Engineering, Southeast University, Nanjing, China Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China School of Computing Information Sciences, Saint Francis University, Hong Kong, China

DOI:

https://doi.org/10.1609/aaai.v40i26.39302

Abstract

Semi-supervised multi-label learning (SSMLL) aims to address the challenge of limited labeled data in multi-label learning (MLL) by leveraging unlabeled data to improve the model’s performance. While pseudo-labeling has become a dominant strategy in SSMLL, most existing methods assign equal weights to all pseudo-labels regardless of their quality, which can amplify the impact of noisy or uncertain predictions and degrade the overall performance. In this paper, we theoretically verify that the optimal weight for a pseudo-label should reflect its correctness likelihood. Empirically, we observe that on the same dataset, the correctness likelihood distribution of unlabeled data remains stable, even as the number of labeled training samples varies. Building on this insight, we propose Distribution-Calibrated Pseudo-labeling (DiCaP), a correctness-aware framework that estimates posterior precision to calibrate pseudo-label weights. We further introduce a dual-thresholding mechanism to separate confident and ambiguous regions: confident samples are pseudo-labeled and weighted accordingly, while ambiguous ones are explored by unsupervised contrastive learning. Experiments conducted on multiple benchmark datasets verify that our method achieves consistent improvements, surpassing state-of-the-art methods by up to 4.27%.

Downloads

Published

2026-03-14

How to Cite

Han, B., Li, Z., Wang, X., Hou, Y., Liu, H., Hou, J., & Jia, Y. (2026). DiCaP: Distribution-Calibrated Pseudo-labeling for Semi-Supervised Multi-Label Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21540–21548. https://doi.org/10.1609/aaai.v40i26.39302

Issue

Section

AAAI Technical Track on Machine Learning III