CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Jeremy Irvin; Pranav Rajpurkar; Michael Ko; Yifan Yu; Silviana Ciurea-Ilcus; Chris Chute; Henrik Marklund; Behzad Haghgoo; Robyn Ball; Katie Shpanskaya; Jayne Seekins; David A. Mong; Safwan S. Halabi; Jesse K. Sandberg; Ricky Jones; David B. Larson; Curtis P. Langlotz; Bhavik N. Patel; Matthew P. Lungren; Andrew Y. Ng

doi:10.1609/aaai.v33i01.3301590

Authors

Jeremy Irvin Stanford University
Pranav Rajpurkar Stanford University
Michael Ko Stanford University
Yifan Yu Stanford University
Silviana Ciurea-Ilcus Stanford University
Chris Chute Stanford University
Henrik Marklund Stanford University
Behzad Haghgoo Stanford University
Robyn Ball Stanford University
Katie Shpanskaya Stanford University
Jayne Seekins Stanford University
David A. Mong University of Colorado
Safwan S. Halabi Stanford University
Jesse K. Sandberg Stanford University
Ricky Jones Stanford University
David B. Larson Stanford University
Curtis P. Langlotz Stanford University
Bhavik N. Patel Stanford University
Matthew P. Lungren Stanford University
Andrew Y. Ng Stanford University

DOI:

https://doi.org/10.1609/aaai.v33i01.3301590

Abstract

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information