Liu, L., Wang, Y., Shen, B., Zeng, W., Zhang, S., Xu, D., & Wang, P. (2026). Do Large Language Models Reason About Uncertainty Like Humans? A Benchmark on Hurricane Forecast Visualization Comprehension. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17571–17579. https://doi.org/10.1609/aaai.v40i21.38812