Using Interpretable and (Explainable) Ensemble Methods to Unpack Disease Predictions: An Empirical Study of the Cancer Prediction Problem
Principal Investigator
Name
Gorkem Turgut Ozer
Degrees
Ph.D.
Institution
University of New Hampshire
Position Title
Assistant Professor
Email
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-1530
Initial CDAS Request Approval
Apr 12, 2024
Title
Using Interpretable and (Explainable) Ensemble Methods to Unpack Disease Predictions: An Empirical Study of the Cancer Prediction Problem
Summary
Explainable Artificial Intelligence (XAI) has become an important field of study as machine learning models are increasingly deployed in high-stakes domains such as healthcare. Such methods aim to unpack the predictions of black-box ensemble methods. On the other hand, for some problems, interpretable models perform as well as, if not better than, ensemble methods. Our project aims to demonstrate the value of using interpretable methods for cancer prediction and compare it to the value of using ensemble methods and XAI in generating insights. This will be done by comparing the results of interpretable models with the predictions of ensemble methods in terms of predictive performance and error minimization using data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. The PLCO cancer datasets have been widely studied and therefore provide an opportunity to build on prior work and extend and enrich the results.
Specifically, our study will focus on the prostate, lung, colorectal, and ovarian cancer data. Using the PLCO dataset, we plan to first use a Cox Hazards model as our main method to establish a baseline inference on the data. We will then develop Survival Random Forest (SRF) and Survival Gradient Boosting (SGB) models in an attempt to improve the prediction error and compare the results. However, any improvement in prediction error comes at the cost of reduced interpretability, as these methods typically only provide method-specific feature importance scores. To address this issue, we will explore explainability methods such as LIME, SHAP, and SAGE to gain insight into the predictions. Through this exploration, we seek to balance interpretability with predictive performance by comparing and contrasting interpretable methods with explainable methods to improve model transparency in critical decision making scenarios.
Specifically, our study will focus on the prostate, lung, colorectal, and ovarian cancer data. Using the PLCO dataset, we plan to first use a Cox Hazards model as our main method to establish a baseline inference on the data. We will then develop Survival Random Forest (SRF) and Survival Gradient Boosting (SGB) models in an attempt to improve the prediction error and compare the results. However, any improvement in prediction error comes at the cost of reduced interpretability, as these methods typically only provide method-specific feature importance scores. To address this issue, we will explore explainability methods such as LIME, SHAP, and SAGE to gain insight into the predictions. Through this exploration, we seek to balance interpretability with predictive performance by comparing and contrasting interpretable methods with explainable methods to improve model transparency in critical decision making scenarios.
Aims
1. Compare the value of XAI applied to ensemble methods versus interpretable methods in generating insights while maintaining predictive performance using the PLCO data.
2. Provide a foundational example of using interpretable and ensemble models along with XAI methods to perform survival analysis on the PLCO data.
3. Demonstrate the value of using interpretable vs. ensemble and XAI methods to balance interpretability and predictive performance in a high-stakes domain, healthcare, and in particular, cancer prediction.
Collaborators
Di Hu, PhD student from UC Irvine