Incentivizing High Quality User Contributions: New Arm Generation in Bandit Learning
Keywords:incentive, game theory, user generated content, multi-armed bandit
We study the problem of incentivizing high quality contributions in user generated content platforms, in which users arrive sequentially with unknown quality. We are interested in designing a content displaying strategy which decides which content should be chosen to show to users, with the goal of maximizing user experience (i.e., the likelihood of users liking the content).This goal naturally leads to a joint problem of incentivizing high quality contributions and learning the unknown content quality. To address the incentive issue, we consider a model in which users are strategic in deciding whether to contribute and are motivated by exposure, i.e., they aim to maximize the number of times their contributions are viewed. For the learning perspective, we model the content quality as the probability of obtaining positive feedback (e.g., like or upvote) from a random user. Naturally, the platform needs to resolve the classical trade-off between exploration (collecting feedback for all content) and exploitation (displaying the best content). We formulate this problem as a multi-arm bandit problem, where the number of arms (i.e., contributions) is increasing over time and depends on the strategic choices of arriving users. We first show that applying standard bandit algorithms incentivizes a flood of low cost contributions, which in turn leads to linear regret. We then propose Rand_UCB which adds an additional layer of randomization on top of the UCB algorithm to address the issue of flooding contributions. We show that Rand_UCB helps eliminate the incentives for low quality contributions, provides incentives for high quality contributions (due to bounded number of explorations for the low quality ones), and achieves sub-linear regrets with respect to displaying the current best arms.