On the Potential of Large Language Models in ECG-based AFib and Sinus Rhythm Detection and Justification

Authors

  • Maria Slim Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut
  • Chaymaa Abbas Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut
  • Jad Assi Medical School, American University of Beirut
  • Hussein El Jebbawi Medical School, American University of Beirut
  • Alaaeddine El Ghazawi Medical School, American University of Beirut
  • Mariette Awad Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut
  • Fatme Charafeddine Medical School, American University of Beirut
  • Marwan Refaat Medical School, American University of Beirut
  • Fouad Zouein Medical School, American University of Beirut

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36071

Abstract

Atrial fibrillation (AFib) is a common arrhythmia that is associated with increased stroke and mortality risk. It requires early and accurate detection for improved patient healthcare support. This study explores the application of vision-enabled large language models (LLMs)—specifically Llama-3.2-11B-Vision-Instruct and Qwen2-VL-7B-Instruct —for AFib and sinus rhythm detection using ECG images. We designed structured prompts to simulate clinical reasoning, evaluate rhythm features, and elicit model confidence. Models were tested on a curated PTB-XL subset under both full 12-lead and dual-lead (Lead II + V1) configurations. Results show that while Llama achieves higher diagnostic accuracy, especially with Chain-of-Thought prompting (up to 97% for AFib), both models struggle with consistent feature-level interpretation, particularly for sinus rhythm. Our findings underscore both the promise and current limitations of LLMs in ECG-based diagnosis. Bridging the gap between AI-generated outputs and clinical standards will require fine-tuning on ECG-specific data, robust prompting strategies, and hybrid approaches that integrate signal-level reasoning for improved interpretability and reliability in real-world settings.

Downloads

Published

2025-08-01

How to Cite

Slim, M., Abbas, C., Assi, J., El Jebbawi, H., El Ghazawi, A., Awad, M., … Zouein, F. (2025). On the Potential of Large Language Models in ECG-based AFib and Sinus Rhythm Detection and Justification. Proceedings of the AAAI Symposium Series, 6(1), 341–349. https://doi.org/10.1609/aaaiss.v6i1.36071

Issue

Section

Human-AI Collaboration: Exploring Diversity of Human Cognitive Abilities and Varied AI Models for Hybrid Intelligent Systems