Comparing Human Behavior to an Optimal Policy for Innovation


  • Bonan Zhao Princeton University
  • Natalia Vélez Princeton University
  • Thomas L. Griffiths Princeton University



Innovation, Discovery, Explore-exploit, Decision Making, Optimal Stopping


Human learning does not stop at solving a single problem. Instead, we seek new challenges, define new goals, and come up with new ideas. Unlike the classic explore-exploit trade-off between known and unknown options, making new tools or generating new ideas is not about collecting data from existing unknown options, but rather about create new options out of what is currently available. We introduce a discovery game designed to study how rational agents make decisions about pursuing innovations, where discovering new ideas is a process of combining existing ideas in an open-ended compositional space. We derive optimal policies of this decision problem formalized as a Markov decision process, and compare people's behaviors to the model predictions in an online behavioral experiment. We found evidence that people both innovate rationally, guided by potential returns in this discovery game, and under- and over-explore systematically in different settings.






Symposium on Human-Like Learning