Are Language Models Any Good at Density Modeling?
DOI:
https://doi.org/10.1609/aaai.v40i39.40558Abstract
Large Language Models (LLMs) surprised the world with their ability to mimic humans in writing and are starting to be used as simulations of human writers for various kinds of linguistic analyses. However, these analyses rest on the belief that LLMs are good density models that accurately capture the underlying probability distribution of the language. In this paper, we question this basic assumption and try to evaluate language models on their density modelling capabilities. Since a ground truth does not exist for the probability distribution of any natural language, we come up with a synthetic language made up of decimal numbers written in words in English. We train language models from scratch on various probability distributions over this synthetic language and compare the distributions learned by the models with the original distributions. Experiments show that language models can learn underlying probability distributions across a wide range of cases, but they fail when those distributions depend on deep semantic properties of numbers that cannot be inferred from syntactic patterns. Additionally, we observed a strong bias in the models towards numbers that frequently occur as substrings within other numbers. This suggests that such a bias possibly exists in real-world natural language models as well, and negatively impacts downstream tasks and analyses that rely on model-generated probabilities.Downloads
Published
2026-03-14
How to Cite
Ranga, S., Bedampeta, S. S., Mao, R., & Chattopadhyay, A. (2026). Are Language Models Any Good at Density Modeling?. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 32791–32798. https://doi.org/10.1609/aaai.v40i39.40558
Issue
Section
AAAI Technical Track on Natural Language Processing IV