Using Interpretable and (Explainable) Ensemble Methods to Unpack Disease Predictions: An Empirical Study of the Cancer Prediction Problem
Specifically, our study will focus on the prostate, lung, colorectal, and ovarian cancer data. Using the PLCO dataset, we plan to first use a Cox Hazards model as our main method to establish a baseline inference on the data. We will then develop Survival Random Forest (SRF) and Survival Gradient Boosting (SGB) models in an attempt to improve the prediction error and compare the results. However, any improvement in prediction error comes at the cost of reduced interpretability, as these methods typically only provide method-specific feature importance scores. To address this issue, we will explore explainability methods such as LIME, SHAP, and SAGE to gain insight into the predictions. Through this exploration, we seek to balance interpretability with predictive performance by comparing and contrasting interpretable methods with explainable methods to improve model transparency in critical decision making scenarios.
1. Compare the value of XAI applied to ensemble methods versus interpretable methods in generating insights while maintaining predictive performance using the PLCO data.
2. Provide a foundational example of using interpretable and ensemble models along with XAI methods to perform survival analysis on the PLCO data.
3. Demonstrate the value of using interpretable vs. ensemble and XAI methods to balance interpretability and predictive performance in a high-stakes domain, healthcare, and in particular, cancer prediction.
Di Hu, PhD student from UC Irvine