Interpretable Artificial Intelligence to select patients that will benefit from lung cancer screening
Lung cancer is by far the leading cause of cancer death among both men and women, making up almost 25% of all cancer deaths. Each year, more people die of lung cancer than of colon, breast, and prostate cancers combined. The number of new lung cancer cases continues to decrease, partly because people are quitting smoking. Survival depends on stage at diagnosis: the 5-year relative survival rate for localized lung cancer is 61%, but drops to 6% for metastatic stages. When all stages are combined, a quarter of patient survive beyond 5 year. The effect of screening on lung cancer is still controversial. The NLST and NELSON demonstrated that patients screened with a CT had a significant reduction in lung cancer mortality. On the other hand, the PLCO trial showed that chest radiograph did not reduce cancer mortality compared with usual care.
In this project, we propose a method, based on machine learning, to select patients at the highest risk of getting lung and dying from lung cancer to triage them into getting screened.
We will use gradient-boosted decision tree (the state of the art for tabular data) on clinical features in order to predict which patients are at risk and stratify them.
Our goal is to be able to demonstrate the value of screening on a subset of high-risk patients.
In a second step, we intend on using shapley values to determine the most relevant features and filter out the variables that do not significantly contribute to the model. These values provide information on the importance of a feature for the prediction and its role in decreasing or increasing the probability of the explored endpoint.
We will also use the values to provide an explanation of the prediction done by our model at the individual and population scale. Moving away from black-box AI models will help increase the trust that patients and physicians have into AI in healthcare.
- Predict cancer-specific and overall survival
- Triage and select high-risk patients for screening
- Provide an explanation based on shapley values for each prediction