DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes

Authors

  • Aisha Al-Mohannadi Qatar Computing Research Institute, Hamad Bin Khalifa University College of Science & Engineering, Hamad Bin Khalifa University
  • Ayisha Firoz Department of Computer Science & Engineering, Qatar University
  • Yin Yang College of Science & Engineering, Hamad Bin Khalifa University
  • Muhammad Imran Qatar Computing Research Institute, Hamad Bin Khalifa University
  • Ferda Ofli Qatar Computing Research Institute, Hamad Bin Khalifa University

DOI:

https://doi.org/10.1609/icwsm.v20i1.42776

Abstract

Social media imagery provides a low-latency source of situational information during natural and human-induced disasters, enabling rapid damage assessment and response. While Visual Question Answering (VQA) has shown strong performance in general-purpose domains, its suitability for the complex and safety-critical reasoning required in disaster response remains unclear. We introduce DisasterVQA, a benchmark dataset designed for perception and reasoning in crisis contexts. DisasterVQA consists of 1,395 real-world images and 4,405 expert-curated question–answer pairs spanning diverse events such as floods, wildfires, and earthquakes. Grounded in humanitarian frameworks including FEMA ESF and OCHA MIRA, the dataset includes binary, multiple-choice, and open-ended questions covering situational awareness and operational decision-making tasks. We benchmark seven state-of-the-art vision–language models and find substantial performance variability across question types, disaster categories, regions, and humanitarian tasks. Although models achieve high accuracy on binary questions, they struggle with fine-grained quantitative reasoning, object counting, and context-sensitive interpretation, particularly for underrepresented disaster scenarios. DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision–language models for disaster response.

Downloads

Published

2026-05-25

How to Cite

Al-Mohannadi, A., Firoz, A., Yang, Y., Imran, M., & Ofli, F. (2026). DisasterVQA: A Visual Question Answering Benchmark Dataset for Disaster Scenes. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2711–2722. https://doi.org/10.1609/icwsm.v20i1.42776