HealthE: Recognizing Health Advice & Entities in Online Health Communities

Joseph Gatto; Parker Seegmiller; Garrett M Johnston; Madhusudan Basak; Sarah Masud Preum

doi:10.1609/icwsm.v17i1.22210

Authors

Joseph Gatto Dartmouth College
Parker Seegmiller Dartmouth College
Garrett M Johnston Dartmouth College
Madhusudan Basak Dartmouth College
Sarah Masud Preum Dartmouth College

DOI:

https://doi.org/10.1609/icwsm.v17i1.22210

Keywords:

, Web and Social Media, Text categorization; topic recognition; demographic/gender/age identification

Abstract

The task of extracting and classifying entities is at the core of important Health-NLP systems such as misinformation detection, medical dialogue modeling, and patient-centric information tools. Granular knowledge of textual entities allows these systems to utilize knowledge bases, retrieve relevant information, and build graphical representations of texts. Unfortunately, most existing works on health entity recognition are trained on clinical notes, which are both lexically and semantically different from public health information found in online health resources or social media. In other words, existing health entity recognizers vastly under-represent the entities relevant to public health data, such as those provided by sites like WebMD. It is crucial that future Health-NLP systems be able to model such information, as people rely on online health advice for personal health management and clinically relevant decision making. In this work, we release a new annotated dataset, HealthE, which facilitates the large-scale analysis of online textual health advice. HealthE consists of 3,400 health advice statements with token-level entity annotations. Additionally, we release 2,256 health statements which are not health advice to facilitate health advice mining. HealthE is the first dataset with an entity-recognition label space designed for the modeling of online health advice. We motivate the need for HealthE by demonstrating the limitations of five widely-used health entity recognizers on HealthE, such as those offered by Google and Amazon. We additionally benchmark three pre-trained language models on our dataset as reference for future research. All data is made publicly available.

HealthE: Recognizing Health Advice & Entities in Online Health Communities

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information