CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation

Authors

  • Chenchen Kuai Texas A&M University - College Station
  • Chenhao Wu University of California, Los Angeles
  • Yang Zhou Texas A&M University - College Station
  • Bruce Wang Texas A&M University - College Station
  • Tianbao Yang Texas A&M University - College Station
  • Zhengzhong Tu Texas A&M University - College Station
  • Zihao Li Texas A&M University - College Station
  • Yunlong Zhang Texas A&M University - College Station

DOI:

https://doi.org/10.1609/aaai.v40i45.41222

Abstract

As tropical cyclones intensify and track forecasts become increasingly uncertain, U.S. ports face heightened supply-chain risk under extreme weather conditions. Port operators need to rapidly synthesize diverse multimodal forecast products, such as probabilistic wind maps, track cones, and official advisories, into clear, actionable guidance as cyclones approach. Multimodal large language models (MLLMs) offer a powerful means to integrate these heterogeneous data sources alongside broader contextual knowledge, yet their accuracy and reliability in the specific context of port cyclone preparedness have not been rigorously evaluated. To fill this gap, we introduce CyPortQA, the first multimodal benchmark tailored to port operations under cyclone threat. CyPortQA assembles 2,917 real-world disruption scenarios from 2015 through 2023, spanning 145 U.S. principal ports and 90 named storms. Each scenario fuses multi-source data (i.e., tropical cyclone products, port operational impact records, and port condition bulletins) and is expanded through an automated pipeline into 117,178 structured question–answer pairs. Using this benchmark, we conduct extensive experiments on diverse MLLMs, including both open-source and proprietary model. MLLMs demonstrate great potential in situation understanding but still face considerable challenges in reasoning tasks, including potential impact estimation and decision reasoning.

Downloads

Published

2026-03-14

How to Cite

Kuai, C., Wu, C., Zhou, Y., Wang, B., Yang, T., Tu, Z., … Zhang, Y. (2026). CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(45), 38781–38789. https://doi.org/10.1609/aaai.v40i45.41222

Issue

Section

AAAI Special Track on AI for Social Impact I