Finding One's Best Crowd: Online Learning By Exploiting Source Similarity

Yang Liu; Mingyan Liu

doi:10.1609/aaai.v30i1.10273

Authors

Yang Liu University of Michigan, Ann Arbor
Mingyan Liu University of Michigan, Ann Arbor

DOI:

https://doi.org/10.1609/aaai.v30i1.10273

Keywords:

online learning, disparate data source, similarity, prediction, crowdsourcing

Abstract

We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.

Finding One's Best Crowd: Online Learning By Exploiting Source Similarity

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information