Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling

Clay H. Yoo; Ashiqur R. KhudaBukhsh

doi:10.1609/aaai.v37i12.26780

Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling

Authors

Clay H. Yoo Carnegie Mellon University
Ashiqur R. KhudaBukhsh Rochester Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v37i12.26780

Keywords:

General

Abstract

This paper makes two key contributions. First, it argues that highly specialized rare content classifiers trained on small data typically have limited exposure to the richness and topical diversity of the negative class (dubbed anticontent) as observed in the wild. As a result, these classifiers' strong performance observed on the test set may not translate into real-world settings. In the context of COVID-19 misinformation detection, we conduct an in-the-wild audit of multiple datasets and demonstrate that models trained with several prominently cited recent datasets are vulnerable to anticontent when evaluated in the wild. Second, we present a novel active learning pipeline that requires zero manual annotation and iteratively augments the training data with challenging anticontent, robustifying these classifiers.

Downloads

Published

2023-06-26

How to Cite

Yoo, C. H., & KhudaBukhsh, A. R. (2023). Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12), 15260-15268. https://doi.org/10.1609/aaai.v37i12.26780

Download Citation

Issue

Vol. 37 No. 12: AAAI-23 Special Tracks

Section

AAAI Special Track on Safe and Robust AI

Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription