Learning from Weak-Label Data: A Deep Forest Expedition
Weak-label learning deals with the problem where each training example is associated with multiple ground-truth labels simultaneously but only partially provided. This circumstance is frequently encountered when the number of classes is very large or when there exists a large ambiguity between class labels, and significantly influences the performance of multi-label learning. In this paper, we propose LCForest, which is the first tree ensemble based deep learning method for weak-label learning. Rather than formulating the problem as a regularized framework, we employ the recently proposed cascade forest structure, which processes information layer-by-layer, and endow it with the ability of exploiting from weak-label data by a concise and highly efficient label complement structure. Specifically, in each layer, the label vector of each instance from testing-fold is modified with the predictions of random forests trained with the corresponding training-fold. Since the ground-truth label matrix is inaccessible, we can not estimate the performance via cross-validation directly. In order to control the growth of cascade forest, we adopt label frequency estimation and the complement flag mechanism. Experiments show that the proposed LCForest method compares favorably against the existing state-of-the-art multi-label and weak-label learning methods.