Skip to Main Content
An official website of the United States government
CDAS has a New Look: On December 9th, the CDAS website was updated with a new design! The update incorporates all of the existing CDAS functionality with a more modern and user friendly interface.

Privacy-preserving generation of synthetic patient data

Principal Investigator

Name
Marcel Klamárik

Degrees
Bc.

Institution
Siemens Healthcare s.r.o.

Position Title
Software Engineer

Email
marcel.klamarik@siemens-healthineers.com

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-977

Initial CDAS Request Approval
May 9, 2022

Title
Privacy-preserving generation of synthetic patient data

Summary
Project breaks down privacy preserving patient data generation. The problem in the domain of medicine is the lack of real patient data due to it's sensitive nature, as well as laws such as GDPR or HIPAA. Research in machine and deep learning is severely halted as a result. The hypothesis of the work is that a way of dealing with this problem is in synthetic data generation. This new data is capable of maintaining usability and meaningfulness, as well as inability of re-identification of patients whose records were used for synthesis.

In the project we analyze privacy and anonymity and ways of retaining these features. Next we analyze machine and deep learning methods capable of generating new data samples, and finally existing solutions of privacy preserving generation of synthetic patient data. After the analysis a proposal of our own solution is presented. This method is later evaluated from the point of privacy and usability in real situations and research. Results are compared with analyzed and existing ones. In the conclusion, we discuss the results of the proposed method, its pros, cons, and possible improvements.

Aims

- Create a method of patient data generation from real data
- Preserve underlying correlations and characteristics of the data
- Preserve privacy and anonymity of the patients and prevent re-identification
- Use newly generated data for machine and deep learning research
- Combat patient data scarcity in academic and commercial areas

Collaborators

Ing. Milan Unger, PhD. - Siemens Healthcare