Automated Unified Reasoning with Vision-Language Models for Multi-modal Burn Assessment
DOI:
https://doi.org/10.1609/aaai.v40i47.41482Abstract
In emerging clinical applications such as ultrasound-based burn assessment, the lack of domain-specific data presents a significant challenge for developing robust AI systems. Vision-language models (VLMs) have shown strong performance in general computer vision tasks, yet their application to medical imaging remains limited, particularly due to insufficient reasoning capabilities and the scarcity of high-quality training data. We introduce AURA (Automated Unified Reasoning for Burn Assessment), a multi-modal approach that integrates pre-trained VLMs with symbolic first-order logic (FOL) reasoning to improve diagnostic accuracy and interpretability in this data-limited setting. For this study, we collected real-patient data over a one-year period at a U.S. burn center, performing all experiments in a real clinical setting to ensure practical relevance. The dataset includes both conventional B-Mode ultrasound and Tissue Doppler Imaging (TDI), with TDI introduced here for the first time in burn assessment, underscoring the emerging nature of this work. Beyond burn severity classification, we assess the system’s ability to produce expert-level surgical insight directly from imaging data. On the retrospective dataset, it achieves up to 93% accuracy in surgical classification and 87% in fine-grained burn depth prediction, comparable to expert-informed predictions and substantially exceeding the 70% accuracy of traditional visual inspection by human experts. These results, obtained from a novel multi-modal dataset collected in a real clinical burn center setting, highlight the potential of this approach to improve decision-making in burn care. To further support future deployment, we demonstrate a prototype integration with an Electronic Medical Record (EMR) system that aligns with clinical workflows and supports scalable, real-world implementation.Downloads
Published
2026-03-14
How to Cite
Rahman, M. M., Masry, M. E., Gordillo, G., & Wachs, J. (2026). Automated Unified Reasoning with Vision-Language Models for Multi-modal Burn Assessment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 40402–40408. https://doi.org/10.1609/aaai.v40i47.41482
Issue
Section
IAAI Technical Track on Emerging Applications of AI