PLCO-1530: Using Interpretable and (Explainable) Ensemble Methods to Unpack Disease Predictions: … - Approved Projects

Studies on CDAS

Additional Studies...

More Information

Principal Investigator

Name

Gorkem Turgut Ozer

Degrees

Ph.D.

Institution

University of New Hampshire

Position Title

Assistant Professor

gt.ozer@unh.edu

About this CDAS Project

Study

PLCO (Learn more about this study)

Project ID

PLCO-1530

Initial CDAS Request Approval

Apr 12, 2024

Title

Using Interpretable and (Explainable) Ensemble Methods to Unpack Disease Predictions: An Empirical Study of the Cancer Prediction Problem

Summary

Explainable Artificial Intelligence (XAI) has become an important field of study as machine learning models are increasingly deployed in high-stakes domains such as healthcare. Such methods aim to unpack the predictions of black-box ensemble methods. On the other hand, for some problems, interpretable models perform as well as, if not better than, ensemble methods. Our project aims to demonstrate the value of using interpretable methods for cancer prediction and compare it to the value of using ensemble methods and XAI in generating insights. This will be done by comparing the results of interpretable models with the predictions of ensemble methods in terms of predictive performance and error minimization using data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. The PLCO cancer datasets have been widely studied and therefore provide an opportunity to build on prior work and extend and enrich the results.

Specifically, our study will focus on the prostate, lung, colorectal, and ovarian cancer data. Using the PLCO dataset, we plan to first use a Cox Hazards model as our main method to establish a baseline inference on the data. We will then develop Survival Random Forest (SRF) and Survival Gradient Boosting (SGB) models in an attempt to improve the prediction error and compare the results. However, any improvement in prediction error comes at the cost of reduced interpretability, as these methods typically only provide method-specific feature importance scores. To address this issue, we will explore explainability methods such as LIME, SHAP, and SAGE to gain insight into the predictions. Through this exploration, we seek to balance interpretability with predictive performance by comparing and contrasting interpretable methods with explainable methods to improve model transparency in critical decision making scenarios.

Aims

1. Compare the value of XAI applied to ensemble methods versus interpretable methods in generating insights while maintaining predictive performance using the PLCO data.
2. Provide a foundational example of using interpretable and ensemble models along with XAI methods to perform survival analysis on the PLCO data.
3. Demonstrate the value of using interpretable vs. ensemble and XAI methods to balance interpretability and predictive performance in a high-stakes domain, healthcare, and in particular, cancer prediction.

Collaborators

Di Hu, PhD student from UC Irvine