BAND: Biomedical Alert News Dataset


  • Zihao Fu University of Cambridge
  • Meiru Zhang University of Cambridge
  • Zaiqiao Meng University of Glasgow University of Cambridge
  • Yannan Shen McGill University
  • David Buckeridge McGill University
  • Nigel Collier University of Cambridge



NLP: Other, NLP: Applications, NLP: Information Extraction


Infectious disease outbreaks continue to pose a significant threat to human health and well-being. To improve disease surveillance and understanding of disease spread, several surveillance systems have been developed to monitor daily news alerts and social media. However, existing systems lack thorough epidemiological analysis in relation to corresponding alerts or news, largely due to the scarcity of well-annotated reports data. To address this gap, we introduce the Biomedical Alert News Dataset (BAND), which includes 1,508 samples from existing reported news articles, open emails, and alerts, as well as 30 epidemiology-related questions. These questions necessitate the model's expert reasoning abilities, thereby offering valuable insights into the outbreak of the disease. The BAND dataset brings new challenges to the NLP world, requiring better inference capability of the content and the ability to infer important information. We provide several benchmark tasks, including Named Entity Recognition (NER), Question Answering (QA), and Event Extraction (EE), to demonstrate existing models' capabilities and limitations in handling epidemiology-specific tasks. It is worth noting that some models may lack the human-like inference capability required to fully utilize the corpus. To the best of our knowledge, the BAND corpus is the largest corpus of well-annotated biomedical outbreak alert news with elaborately designed questions, making it a valuable resource for epidemiologists and NLP researchers alike.



How to Cite

Fu, Z., Zhang, M., Meng, Z., Shen, Y., Buckeridge, D., & Collier, N. (2024). BAND: Biomedical Alert News Dataset. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18012-18020.



AAAI Technical Track on Natural Language Processing I