Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Marcel Klamárik
Degrees
Bc.
Institution
Siemens Healthcare s.r.o.
Position Title
Software Engineer
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-977
Initial CDAS Request Approval
May 9, 2022
Title
Privacy-preserving generation of synthetic patient data
Summary
Project breaks down privacy preserving patient data generation. The problem in the domain of medicine is the lack of real patient data due to it's sensitive nature, as well as laws such as GDPR or HIPAA. Research in machine and deep learning is severely halted as a result. The hypothesis of the work is that a way of dealing with this problem is in synthetic data generation. This new data is capable of maintaining usability and meaningfulness, as well as inability of re-identification of patients whose records were used for synthesis.

In the project we analyze privacy and anonymity and ways of retaining these features. Next we analyze machine and deep learning methods capable of generating new data samples, and finally existing solutions of privacy preserving generation of synthetic patient data. After the analysis a proposal of our own solution is presented. This method is later evaluated from the point of privacy and usability in real situations and research. Results are compared with analyzed and existing ones. In the conclusion, we discuss the results of the proposed method, its pros, cons, and possible improvements.
Aims

- Create a method of patient data generation from real data
- Preserve underlying correlations and characteristics of the data
- Preserve privacy and anonymity of the patients and prevent re-identification
- Use newly generated data for machine and deep learning research
- Combat patient data scarcity in academic and commercial areas

Collaborators

Ing. Milan Unger, PhD. - Siemens Healthcare