Liu, Le, et al. “Do Large Language Models Reason About Uncertainty Like Humans? A Benchmark on Hurricane Forecast Visualization Comprehension”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 21, Mar. 2026, pp. 17571-9, doi:10.1609/aaai.v40i21.38812.