PandemIQ Llama: A Domain-Adapted Foundation Model for Enhanced Pandemic Intelligence

Jingmei Yang; Mahtab Talaei; Britta Lassmann; Nahid Bhadelia; Ioannis Ch. Paschalidis

doi:10.1609/aaai.v40i46.41301

Authors

Jingmei Yang Boston University
Mahtab Talaei Boston University
Britta Lassmann Boston University
Nahid Bhadelia Boston University
Ioannis Ch. Paschalidis Boston University

DOI:

https://doi.org/10.1609/aaai.v40i46.41301

Abstract

We introduce PandemIQ Llama, a domain-adapted large language model (LLM) designed specifically for pandemic intelligence applications. Building on the pre-trained Llama-3.1-8B model, we conducted continuous training using our curated Pandemic Corpus. This dataset was assembled from authoritative public health sources, scientific literature, and specialized knowledge repositories, comprising 508,924 documents totaling 5.8 billion tokens, which is the largest pandemic domain specific data cohort for LLM training. Benefited from our curated large data cohorts and through continuous training leveraging extensive computational resources, the developed PandemIQ Llama model can extract critical domain knowledge on pandemic, which is typically underrepresented in general-purpose language models, To evaluate its performance, we conducted comprehensive comparison of PandemIQ Llama with both prompt-engineered and task-specific fine-tuned baseline models using two tasks: the Biomedical Alert News Question Answering task (1,508 reports with 30 expert-generated questions each) and the Disease Event Type Classification benchmark (4,500 news snippets across eight disease categories). PandemIQ Llama demonstrated substantial improvements over strong baseline models, achieving performance gains ranging from 3.8% to 10.97%. These results suggest that PandemIQ Llama could significantly enhance public health surveillance and analysis capabilities. In addition, our result also suggests that the LLMs can perform better with continuous training than fine-tuning on domain specific tasks. Social Impact: The BEACON platform, powered by our model, launched and now serves over 100 government and multilateral public health organizations and users across 154 countries. Analytics from the platform is being integrated into the Epidemic Intelligence from Open Sources system run by the World Health Organization. This integration will provide public health decision-makers with a powerful LLM-based tool for pandemic surveillance.

PandemIQ Llama: A Domain-Adapted Foundation Model for Enhanced Pandemic Intelligence

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information