Finding One's Best Crowd: Online Learning By Exploiting Source Similarity

Authors

  • Yang Liu University of Michigan, Ann Arbor
  • Mingyan Liu University of Michigan, Ann Arbor

DOI:

https://doi.org/10.1609/aaai.v30i1.10273

Keywords:

online learning, disparate data source, similarity, prediction, crowdsourcing

Abstract

We consider an online learning problem (classification or prediction) involving disparate sources of sequentially arriving data, whereby a user over time learns the best set of data sources to use in constructing the classifier by exploiting their similarity. We first show that, when (1) the similarity information among data sources is known, and (2) data from different sources can be acquired without cost, then a judicious selection of data from different sources can effectively enlarge the training sample size compared to using a single data source, thereby improving the rate and performance of learning; this is achieved by bounding the classification error of the resulting classifier. We then relax assumption (1) and characterize the loss in learning performance when the similarity information must also be acquired through repeated sampling. We further relax both (1) and (2) and present a cost-efficient algorithm that identifies a best crowd from a potentially large set of data sources in terms of both classifier performance and data acquisition cost. This problem has various applications, including online prediction systems with time series data of various forms, such as financial markets, advertisement and network measurement.

Downloads

Published

2016-02-21

How to Cite

Liu, Y., & Liu, M. (2016). Finding One’s Best Crowd: Online Learning By Exploiting Source Similarity. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10273

Issue

Section

Technical Papers: Machine Learning Methods