Advanced Artificial Agents Intervene in the Provision of Reward
To analyze the expected behavior of advanced artificial agents, we consider a formal idealized agent that makes observations that inform it about its goal, and we find that it can never disambiguate the message from the referent. When we provide, for example, a large reward to indicate that something about the world is satisfactory to us, and leave the agent to determine what that is, it may conclude that what satisfied us was the sending of the reward itself; no observation can refute that. This conclusion incents the agent to intervene in the provision of its own reward (sometimes called wireheading), decoupling the reward from its intended referent. We discuss recent approaches to avoiding this problem—myopia, imitation learning, quantilization, risk aversion, and inverse reinforcement learning—and our biggest concerns with them.
How to Cite
Copyright (c) 2022 Michael K. Cohen, Marcus Hutter, Michael A. Osborne
This work is licensed under a Creative Commons Attribution 4.0 International License.