Data Integration and Imputation Across Multiple Medical Datasets
Principal Investigator
Name
Vivatchai Kaveeta
Degrees
M.Eng.
Institution
Chiang Mai University
Position Title
Ph.D candidate
Email
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-1120
Initial CDAS Request Approval
Dec 5, 2022
Title
Data Integration and Imputation Across Multiple Medical Datasets
Summary
Every medical dataset has unique characteristics. These differences may come from the diseases, objectives, observation methods, etc. In a way, each dataset provides subset samples of all possible outcomes. As a result, the performance of the prediction models highly depends on the dataset. The models trained using a specific dataset likely do not perform well on the sample from another source. Therefore, this work intends to explore the method to combine multiple medical datasets, to improve overall data variation. And to combat sparse integrated datasets, the imputation methods will be studied. The method will be evaluated by various machine learning prediction models to prove its effectiveness.
Aims
- Explore the characteristics of medical datasets
- Use imputation methods to fill in the missing values
- Integrate multiple datasets to improve the prediction performance
Collaborators
Prompong Sugunnasil, Department of Software Engineering, College of Arts, Media and Technology, Chiang Mai University
Juggapong Natwichai, Department of Computer Engineering, Faculty of Engineering, Chiang Mai University