CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation

Chenchen Kuai; Chenhao Wu; Yang Zhou; Bruce Wang; Tianbao Yang; Zhengzhong Tu; Zihao Li; Yunlong Zhang

doi:10.1609/aaai.v40i45.41222

Authors

Chenchen Kuai Texas A&M University - College Station
Chenhao Wu University of California, Los Angeles
Yang Zhou Texas A&M University - College Station
Bruce Wang Texas A&M University - College Station
Tianbao Yang Texas A&M University - College Station
Zhengzhong Tu Texas A&M University - College Station
Zihao Li Texas A&M University - College Station
Yunlong Zhang Texas A&M University - College Station

DOI:

https://doi.org/10.1609/aaai.v40i45.41222

Abstract

As tropical cyclones intensify and track forecasts become increasingly uncertain, U.S. ports face heightened supply-chain risk under extreme weather conditions. Port operators need to rapidly synthesize diverse multimodal forecast products, such as probabilistic wind maps, track cones, and official advisories, into clear, actionable guidance as cyclones approach. Multimodal large language models (MLLMs) offer a powerful means to integrate these heterogeneous data sources alongside broader contextual knowledge, yet their accuracy and reliability in the specific context of port cyclone preparedness have not been rigorously evaluated. To fill this gap, we introduce CyPortQA, the first multimodal benchmark tailored to port operations under cyclone threat. CyPortQA assembles 2,917 real-world disruption scenarios from 2015 through 2023, spanning 145 U.S. principal ports and 90 named storms. Each scenario fuses multi-source data (i.e., tropical cyclone products, port operational impact records, and port condition bulletins) and is expanded through an automated pipeline into 117,178 structured question–answer pairs. Using this benchmark, we conduct extensive experiments on diverse MLLMs, including both open-source and proprietary model. MLLMs demonstrate great potential in situation understanding but still face considerable challenges in reasoning tasks, including potential impact estimation and decision reasoning.

CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information