The Undergraduate Games Corpus: A Dataset for Machine Perception of Interactive Media


  • Barrett R. Anderson UC Santa Cruz
  • Adam M. Smith UC Santa Cruz




Machine perception research primarily focuses on processing static inputs (e.g. images and texts). We are interested in machine perception of interactive media (such as games, apps, and complex web applications) where interactive audience choices have long-term implications for the audience experience. While there is ample research on AI methods for the task of playing games (often just one game at a time), this work is difficult to apply to new and in-development games or to use for non-playing tasks such as similarity-based retrieval or authoring assistance. In response, we contribute a corpus of 755 games and structured metadata, spread across several platforms (Twine, Bitsy, Construct, and Godot), with full source and assets available and appropriately licensed for use and redistribution in research. Because these games were sourced from student projects in an undergraduate game development program, they reference timely themes in their content and represent a variety of levels of design polish rather than only representing past commercial successes. This corpus could accelerate research in understanding interactive media while anchoring that work in freshly-developed games intended as legitimate human experiences (rather than lab-created AI testbeds). We validate the utility of this corpus by setting up the novel task of predicting tags relevant to the player experience from the game source code, showing that representations that better exploit the structure of the media outperform a text-only baseline.




How to Cite

Anderson, B. R., & Smith, A. M. (2021). The Undergraduate Games Corpus: A Dataset for Machine Perception of Interactive Media. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 3-11. Retrieved from



AAAI Technical Track on Application Domains