An Empirical Evaluation of Evaluation Metrics of Procedurally Generated Mario Levels

Authors

  • Julian Mariño Universidade Federal de Viçosa
  • Willian Reis Universidade Federal de Viçosa
  • Levi Lelis Universidade Federal de Viçosa

DOI:

https://doi.org/10.1609/aiide.v11i1.12785

Keywords:

Procedural Content Generation, Computational Metrics, Empirical Study, User Study, Infinite Mario Bros, Platform Games, Level Generation, Content Evaluation

Abstract

There are several approaches in the literature for automatically generating Infinite Mario Bros levels. The evaluation of such approaches is often performed solely with computational metrics such as leniency and linearity. While these metrics are important for an initial exploratory evaluation of the content generated, it is not clear whether they are able to capture the player's perception of the content generated. In this paper we evaluate several of the commonly used computational metrics. Namely, we perform a systematic user study with procedural content generation systems and compare the insights gained from our user study with those gained from analyzing the computational metric values. The results of our experiment suggest that current computational metrics should not be used in lieu of user studies for evaluating content generated by computer programs.

Downloads

Published

2021-06-24

How to Cite

Mariño, J., Reis, W., & Lelis, L. (2021). An Empirical Evaluation of Evaluation Metrics of Procedurally Generated Mario Levels. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 11(1), 44-50. https://doi.org/10.1609/aiide.v11i1.12785