Estimating the True Distribution of Data Collected with Randomized Response

Authors

  • Carlos Antonio Pinzón INRIA
  • Ehab ElSalamouny INRIA
  • Lucas Massot École Polytechnique
  • Alexis Miller Ecole Normale Supérieure de Lyon
  • Héber Hwang Arcolezi INRIA
  • Catuscia Palamidessi INRIA

DOI:

https://doi.org/10.1609/aaai.v40i42.40888

Abstract

Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees. It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users' data. Each user reports an automatic random alteration of their true value to the analytics server, which then estimates the histogram of the true unseen values of all users using a debiasing rule to compensate for the added randomness. A known issue is that the standard debiasing rule can yield a vector with negative values (which can not be interpreted as a histogram), and there is no consensus on the best fix. An elegant but slow solution is the Iterative Bayesian Update algorithm (IBU), which converges to the Maximum Likelihood Estimate (MLE) as the number of iterations goes to infinity. This paper bypasses IBU by providing a simple formula for the exact MLE of RR and compares it with other estimation methods experimentally to help practitioners decide which one to use.

Downloads

Published

2026-03-14

How to Cite

Pinzón, C. A., ElSalamouny, E., Massot, L., Miller, A., Hwang Arcolezi, H., & Palamidessi, C. (2026). Estimating the True Distribution of Data Collected with Randomized Response. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35751–35758. https://doi.org/10.1609/aaai.v40i42.40888

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI