Semi-Supervised Streaming Learning with Emerging New Labels
In many real-world applications, the modeling environment is usually dynamic and evolutionary, especially in a data stream where emerging new class often happens. Great efforts have been devoted to learning with novel concepts recently, which are typically in a supervised setting with completely supervised initialization. However, the data collected in the stream are often in a semi-supervised manner actually, which means only a few of them are labeled while the great majority miss ground-truth labels. Besides, new classes hidden in unlabeled instances bring more challenges for the learning task. In this paper, we tackle these issues by a new approach called SEEN which consists of three major components: an effective novel class detector based on clustering random trees, a robust classifier for predictions on the known classes, and an efficient updating process that ensures the whole framework adapts to the changing environment automatically. The classifier produces known labels via label propagation that utilizes all labeled and part unlabeled data in the past which naturally describe the entire stream seen so far. Empirical studies on several datasets validate that the algorithm can accurately classify points on a dynamic stream with a small number of labeled examples and emerging new classes.