Online Semi-supervised Learning with Mix-Typed Streaming Features
Keywords:DMKM: Data Stream Mining, ML: Online Learning & Bandits, ML: Semi-Supervised Learning, ML: Time-Series/Data Streams
AbstractOnline learning with feature spaces that are not fixed but can vary over time renders a seemingly flexible learning paradigm thus has drawn much attention. Unfortunately, two restrictions prohibit a ubiquitous application of this learning paradigm in practice. First, whereas prior studies mainly assume a homogenous feature type, data streams generated from real applications can be heterogeneous in which Boolean, ordinal, and continuous co-exist. Existing methods that prescribe parametric distributions such as Gaussians would not suffice to model the correlation among such mixtyped features. Second, while full supervision seems to be a default setup, providing labels to all arriving data instances over a long time span is tangibly onerous, laborious, and economically unsustainable. Alas, a semi-supervised online learner that can deal with mix-typed, varying feature spaces is still missing. To fill the gap, this paper explores a novel problem, named Online Semi-supervised Learning with Mixtyped streaming Features (OSLMF), which strives to relax the restrictions on the feature type and supervision information. Our key idea to solve the new problem is to leverage copula model to align the data instances with different feature spaces so as to make their distance measurable. A geometric structure underlying data instances is then established in an online fashion based on their distances, through which the limited labeling information is propagated, from the scarce labeled instances to their close neighbors. Experimental results are documented to evidence the viability and effectiveness of our proposed approach. Code is released in https://github.com/wudi1989/OSLMF.
How to Cite
Wu, D., Zhuo, S., Wang, Y., Chen, Z., & He, Y. (2023). Online Semi-supervised Learning with Mix-Typed Streaming Features. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4), 4720-4728. https://doi.org/10.1609/aaai.v37i4.25596
AAAI Technical Track on Data Mining and Knowledge Management