On the Potential of Large Language Models in ECG-based AFib and Sinus Rhythm Detection and Justification

Maria Slim; Chaymaa Abbas; Jad Assi; Hussein El Jebbawi; Alaaeddine El Ghazawi; Mariette Awad; Fatme Charafeddine; Marwan Refaat; Fouad Zouein

doi:10.1609/aaaiss.v6i1.36071

Authors

Maria Slim Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut
Chaymaa Abbas Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut
Jad Assi Medical School, American University of Beirut
Hussein El Jebbawi Medical School, American University of Beirut
Alaaeddine El Ghazawi Medical School, American University of Beirut
Mariette Awad Maroun Semaan Faculty of Engineering and Architecture, American University of Beirut
Fatme Charafeddine Medical School, American University of Beirut
Marwan Refaat Medical School, American University of Beirut
Fouad Zouein Medical School, American University of Beirut

DOI:

https://doi.org/10.1609/aaaiss.v6i1.36071

Abstract

Atrial fibrillation (AFib) is a common arrhythmia that is associated with increased stroke and mortality risk. It requires early and accurate detection for improved patient healthcare support. This study explores the application of vision-enabled large language models (LLMs)—specifically Llama-3.2-11B-Vision-Instruct and Qwen2-VL-7B-Instruct —for AFib and sinus rhythm detection using ECG images. We designed structured prompts to simulate clinical reasoning, evaluate rhythm features, and elicit model confidence. Models were tested on a curated PTB-XL subset under both full 12-lead and dual-lead (Lead II + V1) configurations. Results show that while Llama achieves higher diagnostic accuracy, especially with Chain-of-Thought prompting (up to 97% for AFib), both models struggle with consistent feature-level interpretation, particularly for sinus rhythm. Our findings underscore both the promise and current limitations of LLMs in ECG-based diagnosis. Bridging the gap between AI-generated outputs and clinical standards will require fine-tuning on ECG-specific data, robust prompting strategies, and hybrid approaches that integrate signal-level reasoning for improved interpretability and reliability in real-world settings.

On the Potential of Large Language Models in ECG-based AFib and Sinus Rhythm Detection and Justification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information