Online Semi-supervised Learning with Mix-Typed Streaming Features


  • Di Wu College of Computer and Information Science, Southwest University, Chongqing 400715, China
  • Shengda Zhuo Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou 510006, China
  • Yu Wang Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou 510006, China
  • Zhong Chen Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
  • Yi He Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA



DMKM: Data Stream Mining, ML: Online Learning & Bandits, ML: Semi-Supervised Learning, ML: Time-Series/Data Streams


Online learning with feature spaces that are not fixed but can vary over time renders a seemingly flexible learning paradigm thus has drawn much attention. Unfortunately, two restrictions prohibit a ubiquitous application of this learning paradigm in practice. First, whereas prior studies mainly assume a homogenous feature type, data streams generated from real applications can be heterogeneous in which Boolean, ordinal, and continuous co-exist. Existing methods that prescribe parametric distributions such as Gaussians would not suffice to model the correlation among such mixtyped features. Second, while full supervision seems to be a default setup, providing labels to all arriving data instances over a long time span is tangibly onerous, laborious, and economically unsustainable. Alas, a semi-supervised online learner that can deal with mix-typed, varying feature spaces is still missing. To fill the gap, this paper explores a novel problem, named Online Semi-supervised Learning with Mixtyped streaming Features (OSLMF), which strives to relax the restrictions on the feature type and supervision information. Our key idea to solve the new problem is to leverage copula model to align the data instances with different feature spaces so as to make their distance measurable. A geometric structure underlying data instances is then established in an online fashion based on their distances, through which the limited labeling information is propagated, from the scarce labeled instances to their close neighbors. Experimental results are documented to evidence the viability and effectiveness of our proposed approach. Code is released in




How to Cite

Wu, D., Zhuo, S., Wang, Y., Chen, Z., & He, Y. (2023). Online Semi-supervised Learning with Mix-Typed Streaming Features. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4), 4720-4728.



AAAI Technical Track on Data Mining and Knowledge Management