Privacy-Preserving Data Synthesis via Differentially Private Normalizing Flows with Application to Electronic Health Records Data
DOI:
https://doi.org/10.1609/aaaiss.v1i1.27495Keywords:
Data Privacy, Differential Privacy, Synthetic Data, Data Synthesis, Normalizing Flow, Variational Inference, Generative Model, Noisy Stochastic Gradient Descents, Privacy Loss Accounting (composition), Electronic Health RecordsAbstract
Medical data often contain sensitive personal information about individuals, posing significant limitations to it being shared or released for downstream learning and inferential tasks. We use normalizing flows (NF), a family of deep generative models, to estimate the probability density of a dataset with differential privacy (DP) guarantees, from which privacy-preserving synthetic data are generated and released. We apply the technique to an electronic health records dataset containing patients with pulmonary hypertension. We assess the learning and inferential utility of synthetic data by comparing the accuracy of hypertension predictions and the variational posterior distribution of the parameters in a physics-based model. The results suggest that synthetic data generated via NF with DP can yield good utility at a reasonable privacy cost. Our study provides evidence and adds to the growing literature on the feasibility of generating synthetic medical data for sharing or obtaining inferences from medical data using deep generate models with formal privacy guarantees.Downloads
Published
2023-10-03
How to Cite
Su, B., Wang, Y., Schiavazzi, D., & Liu, F. (2023). Privacy-Preserving Data Synthesis via Differentially Private Normalizing Flows with Application to Electronic Health Records Data. Proceedings of the AAAI Symposium Series, 1(1), 161–167. https://doi.org/10.1609/aaaiss.v1i1.27495
Issue
Section
Second Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics