Advanced Artificial Agents Intervene in the Provision of Reward


  • Michael K. Cohen University of Oxford
  • Marcus Hutter Australian National University
  • Michael A. Osborne University of Oxford



To analyze the expected behavior of advanced artificial agents, we consider a formal idealized agent that makes observations that inform it about its goal, and we find that it can never disambiguate the message from the referent. When we provide, for example, a large reward to indicate that something about the world is satisfactory to us, and leave the agent to determine what that is, it may conclude that what satisfied us was the sending of the reward itself; no observation can refute that. This conclusion incents the agent to intervene in the provision of its own reward (sometimes called wireheading), decoupling the reward from its intended referent. We discuss recent approaches to avoiding this problem—myopia, imitation learning, quantilization, risk aversion, and inverse reinforcement learning—and our biggest concerns with them.




How to Cite

Cohen, M., Hutter, M., & Osborne, M. (2022). Advanced Artificial Agents Intervene in the Provision of Reward. AI Magazine, 43(3), 282-293.