Learning from Positive and Unlabeled Data without Explicit Estimation of Class Prior
Learning a classifier from positive and unlabeled data may occur in various applications. It differs from the standard classification problems by the absence of labeled negative examples in the training set. So far, two main strategies have typically been used for this issue: the likely negative examplesbased strategy and the class prior-based strategy, in which the likely negative examples or the class prior is required to be obtained in a preprocessing step. In this paper, a new strategy based on the Bhattacharyya coefficient is put forward, which formalizes this learning problem as an optimization problem and does not need a preprocessing step. We first show that with the given positive class conditional probability density function (PDF) and the mixture PDF of both the positive class and the negative class, the class prior can be estimated by minimizing the Bhattacharyya coefficient of the positive class with respect to the negative class. We then show how to use this result in an implicit mixture model of restricted Boltzmann machines to estimate the positive class conditional PDF and the negative class conditional PDF directly to obtain a classifier without the explicit estimation of the class prior. Many experiments on real and synthetic datasets illustrated the superiority of the proposed approach.