Interpretability of machine learning based prediction models for ovarian cancer classification using PLCO dataset
Numerous studies have shown efforts to identify strategies to improve ovarian cancer detection and survival.Some of the research studies demonstrated how machine learning models classified patients into cancer vs non cancer categories.It would be worth exploring and analyzing the factors for the disease stratification.
1. What were the differentiating factors that classified cancer cases vs non-cancer cases?
2. Do multiple machine learning models stratify the cases in similar way?
3. What features across models influence the prediction?
4. When a particular record is classified as cancer, what are the driving features of prediction for that given data point?
To address these questions, we need to evaluate multiple interpretable models for ovarian cancer diagnosis. Thus, the purpose of this research is to utilize the comprehensive PLCO Ovarian Cancer dataset to examine different methods for machine learning prediction model explainability.
Generally, there is a trade-off between performance and interpretation of the model. The better a model performs, the more complex it becomes and the less it is interpretable.
Various models like Logistic Regression, XGBoost, Random Forest, etc. would be evaluated on performance metrics like accuracy, AUC curve, F1 score, however, the main focus of this research would be to analyse techniques and frameworks in interpreting on the best best performing ML models.
Two aspects of interpretability of the model will be explored and examined for ovarian cancer stratification
1. Global interpretation - the model would help determine the most influencing features in cancer prediction. The aim is to be able to identify core subset of features to discriminate cancer vs non-cancer cases using model properties like RFE (Recursive Features Elimination), features_importance_etc.
2. Local interpretation – Examine interpretability for individual prediction through model-agnostic methods like LIME (Local interpretable model-agnostic explanations), SHAP (SHapley Additive exPlanations), etc.
1. Explore the different frameworks for ML model explainability by-
i. Building multiple machine learning prediction based models (eg: Logistic Regression, Random Forest, XGBoost, etc.)
ii. Examining two aspects of interpretability : Global interpretation and local interpretation
2. Identify differentiating factors that classified ovarian cancer cases vs non-cancer cases
3. Reap benefits of interpretability like directing future data collection, informing human decision-making and nurture an enduring trust in Machine learning.
None