LLM Game Rule Understanding Through Out-of-Distribution Fine-Tuning

Authors

  • Bahar Bateni University of California, Santa Cruz
  • Benjamin Pratt University of California, Santa Cruz
  • Jim Whitehead University of California, Santa Cruz

DOI:

https://doi.org/10.1609/aiide.v21i1.36804

Abstract

Large Language Models (LLMs) have shown that a model pre-trained on general knowledge can perform well on specific tasks. However, LLMs natively perform poorly when it comes to demonstrating an understanding of rules, such as applying them, interacting with them, generating or modifying them, or evaluating them. Fine-tuning LLMs on a specific set of rules can significantly improve this performance. Yet, doing so undermines one of the main advantages of using a pre-trained model, which is its ability to generalize to rulesets outside its training distribution. This ability is critical for using LLMs as a tool in the game development process to give feedback or suggest rule modifications. In this paper, we introduce a framework for generating datasets to benchmark and train LLMs on their understanding of rules. We use Solitaire card games as our testbed for generating these datasets, as they have simple rules but offer a large space of possible variants, each played completely differently. We define a set of these variants using our custom Game Description Language (GDL) and use the framework to generate game progression questions, along with a textual explanation for each answer. Using these datasets, we conduct experiments to evaluate multiple LLMs on their understanding of rules, both with and without fine-tuning. Furthermore, we perform out-of-distribution evaluations in which the model is tested on rulesets it has not been trained on. Our results show that fine-tuning can improve the model's performance on both in-distribution and out-of-distribution rulesets, suggesting that training on rule-based datasets can improve general rule understanding of LLMs.

Downloads

Published

2025-11-07

How to Cite

Bateni, B., Pratt, B., & Whitehead, J. (2025). LLM Game Rule Understanding Through Out-of-Distribution Fine-Tuning. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 21(1), 2–11. https://doi.org/10.1609/aiide.v21i1.36804