Semi-Supervised AUC Optimization Without Guessing Labels of Unlabeled Data

Authors

  • Zheng Xie Nanjing University
  • Ming Li Nanjing University

DOI:

https://doi.org/10.1609/aaai.v32i1.11812

Keywords:

Semi-Supervised Learning, AUC Optimization

Abstract

Semi-supervised learning, which aims to construct learners that automatically exploit the large amount of unlabeled data in addition to the limited labeled data, has been widely applied in many real-world applications. AUC is a well-known performance measure for a learner, and directly optimizing AUC may result in a better prediction performance. Thus, semi-supervised AUC optimization has drawn much attention. Existing semi-supervised AUC optimization methods exploit unlabeled data by explicitly or implicitly estimating the possible labels of the unlabeled data based on various distributional assumptions. However, these assumptions may be violated in many real-world applications, and estimating labels based on the violated assumption may lead to poor performance. In this paper, we argue that, in semi-supervised AUC optimization, it is unnecessary to guess the possible labels of the unlabeled data or prior probability based on any distributional assumptions. We analytically show that the AUC risk can be estimated unbiasedly by simply treating the unlabeled data as both positive and negative. Based on this finding, two semi-supervised AUC optimization methods named Samult and Sampura are proposed. Experimental results indicate that the proposed methods outperform the existing methods.

Downloads

Published

2018-04-29

How to Cite

Xie, Z., & Li, M. (2018). Semi-Supervised AUC Optimization Without Guessing Labels of Unlabeled Data. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11812