Machine learning research for endometrial cancer
The proposed methodology includes data preprocessing steps such as cleaning, feature selection, and normalization to optimize model performance. Various supervised learning models, including logistic regression, random forests, gradient boosting, and neural networks, will be employed to develop a robust predictive framework. Feature selection techniques such as SHapley Additive exPlanations (SHAP) and Recursive Feature Elimination (RFE) will be utilized to identify key predictors. The models will be evaluated using performance metrics such as area under the receiver operating characteristic curve - AUC-ROC, MCC etc. Hyperparameter tuning techniques, including grid search and Bayesian optimization, will be implemented to enhance predictive accuracy. Expected outcomes include an optimized ML model for endometrial cancer risk prediction, identification of high-risk patient groups, and insights into significant risk factors influencing disease onset. The findings aim to support clinical decision-making by integrating ML-derived risk assessment tools into healthcare systems. .
- Develop machine learning model for risk prediction
- Comparing statistical observation (hypothesis testing) to data-driven techniques
- Explain machine learning model using post-hoc explainability techniques such as permutation importance and SHAP
- Improve understanding of mechanisms on a population level
- write scientific paper
Mario Lovric, Nina Karlovic, Jelena Sarac, Dubravka Havas -- all from our Institute
authors from NIH who created this data set if requested