Ranking and Rating Rankings and Ratings
DOI:
https://doi.org/10.1609/aaai.v34i09.7126Abstract
Cardinal scores collected from people are well known to suffer from miscalibrations. A popular approach to address this issue is to assume simplistic models of miscalibration (such as linear biases) to de-bias the scores. This approach, however, often fares poorly because people's miscalibrations are typically far more complex and not well understood. It is widely believed that in the absence of simplifying assumptions on the miscalibration, the only useful information in practice from the cardinal scores is the induced ranking. In this paper we address the fundamental question of whether this widespread folklore belief is actually true. We consider cardinal scores with arbitrary (or even adversarially chosen) miscalibrations that is only required to be consistent with the induced ranking. We design rating-based estimators and prove that despite making no assumptions on the ratings, they strictly and uniformly outperform all possible estimators that rely on only the ranking. These estimators can be used as a plug-in to show the superiority of cardinal scores over ordinal rankings for a variety of applications, including A/B testing and ranking. This work thus provides novel fundamental insights in the eternal debate between cardinal and ordinal data: It ranks the approach of using ratings higher than that of using rankings, and rates both approaches in terms of their estimation errors.