Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Mario Lovric
Degrees
Ph.D.
Institution
Institute for Anthropological Research
Position Title
Head of laboratory
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1853
Initial CDAS Request Approval
Mar 12, 2025
Title
Machine learning research for endometrial cancer
Summary
Endometrial cancer is one of the most prevalent gynecological malignancies, with early detection playing a crucial role in improving patient outcomes. This project proposes the development of a machine learning (ML)-based predictive model to assess endometrial cancer risk using structured clinical, demographic, and medical history data. The dataset comprises variables related to patient demographics, cancer characteristics, family history, and mortality records, enabling a comprehensive approach to risk prediction.
The proposed methodology includes data preprocessing steps such as cleaning, feature selection, and normalization to optimize model performance. Various supervised learning models, including logistic regression, random forests, gradient boosting, and neural networks, will be employed to develop a robust predictive framework. Feature selection techniques such as SHapley Additive exPlanations (SHAP) and Recursive Feature Elimination (RFE) will be utilized to identify key predictors. The models will be evaluated using performance metrics such as area under the receiver operating characteristic curve - AUC-ROC, MCC etc. Hyperparameter tuning techniques, including grid search and Bayesian optimization, will be implemented to enhance predictive accuracy. Expected outcomes include an optimized ML model for endometrial cancer risk prediction, identification of high-risk patient groups, and insights into significant risk factors influencing disease onset. The findings aim to support clinical decision-making by integrating ML-derived risk assessment tools into healthcare systems. .
Aims

- Develop machine learning model for risk prediction
- Comparing statistical observation (hypothesis testing) to data-driven techniques
- Explain machine learning model using post-hoc explainability techniques such as permutation importance and SHAP
- Improve understanding of mechanisms on a population level
- write scientific paper

Collaborators

Mario Lovric, Nina Karlovic, Jelena Sarac, Dubravka Havas -- all from our Institute
authors from NIH who created this data set if requested