Collective Noise Contrastive Estimation for Policy Transfer Learning

Authors

  • Weinan Zhang University College London
  • Ulrich Paquet Microsoft Research
  • Katja Hofmann Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v30i1.10153

Keywords:

Transfer Learning, Policy Learning, Noise Contrastive Estimation, Recommender Systems

Abstract

We address the problem of learning behaviour policies to optimise online metrics from heterogeneous usage data. While online metrics, e.g., click-through rate, can be optimised effectively using exploration data, such data is costly to collect in practice, as it temporarily degrades the user experience. Leveraging related data sources to improve online performance would be extremely valuable, but is not possible using current approaches. We formulate this task as a policy transfer learning problem, and propose a first solution, called collective noise contrastive estimation (collective NCE). NCE is an efficient solution to approximating the gradient of a log-softmax objective. Our approach jointly optimises embeddings of heterogeneous data to transfer knowledge from the source domain to the target domain. We demonstrate the effectiveness of our approach by learning an effective policy for an online radio station jointly from user-generated playlists, and usage data collected in an exploration bucket.

Downloads

Published

2016-02-21

How to Cite

Zhang, W., Paquet, U., & Hofmann, K. (2016). Collective Noise Contrastive Estimation for Policy Transfer Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10153

Issue

Section

Technical Papers: Machine Learning Applications