Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Jean-Emmanuel Bibault
M.D., Ph.D.
Stanford University
Position Title
Postdoctoral Research Fellow
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Mar 5, 2020
Interpretable Artificial Intelligence to select patients that will benefit from lung cancer screening
Lung cancer is the second most common cancer in both men and women. In 2020, there will be about 228,820 new cases of lung cancer (116,300 in men and 112,520 in women) and 135,720 deaths from lung cancer (72,500 in men and 63,220 in women). The average age of people when diagnosed is about 70.

Lung cancer is by far the leading cause of cancer death among both men and women, making up almost 25% of all cancer deaths. Each year, more people die of lung cancer than of colon, breast, and prostate cancers combined. The number of new lung cancer cases continues to decrease, partly because people are quitting smoking. Survival depends on stage at diagnosis: the 5-year relative survival rate for localized lung cancer is 61%, but drops to 6% for metastatic stages. When all stages are combined, a quarter of patient survive beyond 5 year. The effect of screening on lung cancer is still controversial. The NLST and NELSON demonstrated that patients screened with a CT had a significant reduction in lung cancer mortality. On the other hand, the PLCO trial showed that chest radiograph did not reduce cancer mortality compared with usual care.

In this project, we propose a method, based on machine learning, to select patients at the highest risk of getting lung and dying from lung cancer to triage them into getting screened.

We will use gradient-boosted decision tree (the state of the art for tabular data) on clinical features in order to predict which patients are at risk and stratify them.

Our goal is to be able to demonstrate the value of screening on a subset of high-risk patients.

In a second step, we intend on using shapley values to determine the most relevant features and filter out the variables that do not significantly contribute to the model. These values provide information on the importance of a feature for the prediction and its role in decreasing or increasing the probability of the explored endpoint.
We will also use the values to provide an explanation of the prediction done by our model at the individual and population scale. Moving away from black-box AI models will help increase the trust that patients and physicians have into AI in healthcare.

- Predict cancer-specific and overall survival
- Triage and select high-risk patients for screening
- Provide an explanation based on shapley values for each prediction


Lei Xing