Privacy in Image Datasets: A Case Study on Pregnancy Ultrasounds

Rawisara Lohanimit; Yankun Wu; Amelia Katirai; Yuta Nakashima; Noa Garcia

doi:10.1609/aies.v8i2.36661

Privacy in Image Datasets: A Case Study on Pregnancy Ultrasounds

Authors

Rawisara Lohanimit Massachusetts Institute of Technology
Yankun Wu The University of Osaka
Amelia Katirai University of Tsukuba
Yuta Nakashima The University of Osaka
Noa Garcia The University of Osaka

DOI:

https://doi.org/10.1609/aies.v8i2.36661

Abstract

The rise of generative models has led to increased use of large-scale datasets collected from the internet, often with minimal or no data curation. This raises concerns about the inclusion of sensitive or private information. In this work, we explore the presence of pregnancy ultrasound images, which contain sensitive personal information and are often shared online. Through a systematic examination of LAION-400M dataset using CLIP embedding similarity, we retrieve images containing pregnancy ultrasound and detect thousands of entities of private information such as names and locations. Our findings reveal that multiple images have high-risk information that could enable re-identification or impersonation. We conclude with recommended practices for dataset curation, data privacy, and ethical use of public image datasets.

Downloads

Published

2025-10-15

How to Cite

Lohanimit, R., Wu, Y., Katirai, A., Nakashima, Y., & Garcia, N. (2025). Privacy in Image Datasets: A Case Study on Pregnancy Ultrasounds. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(2), 1623-1636. https://doi.org/10.1609/aies.v8i2.36661

Download Citation

Issue

Vol. 8 No. 2 (2025): Proceedings of the Eighth AAAI/ACM Conference on AI, Ethics, and Society (AIES-25) - Main Track II

Section

Main Track II