Are Language Models Any Good at Density Modeling?

Authors

  • Sriram Ranga Nanyang Technological University
  • Sai Shashank Bedampeta Vellore Institute of Technology
  • Rui Mao Nanyang Technological University
  • Anupam Chattopadhyay Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v40i39.40558

Abstract

Large Language Models (LLMs) surprised the world with their ability to mimic humans in writing and are starting to be used as simulations of human writers for various kinds of linguistic analyses. However, these analyses rest on the belief that LLMs are good density models that accurately capture the underlying probability distribution of the language. In this paper, we question this basic assumption and try to evaluate language models on their density modelling capabilities. Since a ground truth does not exist for the probability distribution of any natural language, we come up with a synthetic language made up of decimal numbers written in words in English. We train language models from scratch on various probability distributions over this synthetic language and compare the distributions learned by the models with the original distributions. Experiments show that language models can learn underlying probability distributions across a wide range of cases, but they fail when those distributions depend on deep semantic properties of numbers that cannot be inferred from syntactic patterns. Additionally, we observed a strong bias in the models towards numbers that frequently occur as substrings within other numbers. This suggests that such a bias possibly exists in real-world natural language models as well, and negatively impacts downstream tasks and analyses that rely on model-generated probabilities.

Published

2026-03-14

How to Cite

Ranga, S., Bedampeta, S. S., Mao, R., & Chattopadhyay, A. (2026). Are Language Models Any Good at Density Modeling?. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 32791–32798. https://doi.org/10.1609/aaai.v40i39.40558

Issue

Section

AAAI Technical Track on Natural Language Processing IV