Personalizing colorectal treatment strategies through machine learning
Principal Investigator
Name
Jean-Emmanuel Bibault
Degrees
M.D., Ph.D.
Institution
Stanford University
Position Title
Postdoctoral Research Fellow
Email
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-592
Initial CDAS Request Approval
Feb 27, 2020
Title
Personalizing colorectal treatment strategies through machine learning
Summary
Colorectal cancer (CRC) is the third most commonly diagnosed cancer in males and the second in females, with 1.8 million new cases and almost 861,000 deaths in 2018 according to the World Health Organization GLOBOCAN database. In the US alone, approximately 147,950 new cases of large bowel cancer are diagnosed, . Annually, approximately 53,200 Americans die of CRC, accounting for approximately 8 percent of all cancer deaths.
The effect of screening with flexible sigmoidoscopy has been demonstrated in the PLCO trial: Significant reductions were observed in the incidence of both distal colorectal cancer (479 cases in the intervention group vs. 669 cases in the usual-care group; relative risk, 0.71; 95% CI, 0.64 to 0.80; P<0.001) and proximal colorectal cancer (512 cases vs. 595 cases; relative risk, 0.86; 95% CI, 0.76 to 0.97; P=0.01). There were also fewer death in the screening arm of the trial: 2.9 deaths from colorectal cancer per 10,000 person-years in the intervention group (252 deaths), as compared with 3.9 per 10,000 person-years in the usual-care group (341 deaths), which represents a 26% reduction (relative risk, 0.74; 95% CI, 0.63 to 0.87; P<0.001). Mortality from distal colorectal cancer was reduced by 50% (87 deaths in the intervention group vs. 175 in the usual-care group; relative risk, 0.50; 95% CI, 0.38 to 0.64; P<0.001); mortality from proximal colorectal cancer was unaffected (143 and 147 deaths, respectively; relative risk, 0.97; 95% CI, 0.77 to 1.22; P=0.81).
In this project, we intend to use a gradient-boosted decision tree algorithm (XGBoost) to create a phenotype profile of the patients who are most likely to benefit from CRC screening. Machine learning techniques can be leveraged to unravel clinical entities and relationship that have not been explored. We will use these techniques and train two predictive models for cancer-specific (CSS) and overall survival (OS) models. Class imbalance correction techniques will be used before training the models.
In a second step, we will also compare the performances of models trained on all patients vs a models trained on the subset of patients included in the screening arm.
Beyond prediction, we intend on providing interpretable models that explain how they make their prediction at the scale of the whole population or at the individual scale.
The effect of screening with flexible sigmoidoscopy has been demonstrated in the PLCO trial: Significant reductions were observed in the incidence of both distal colorectal cancer (479 cases in the intervention group vs. 669 cases in the usual-care group; relative risk, 0.71; 95% CI, 0.64 to 0.80; P<0.001) and proximal colorectal cancer (512 cases vs. 595 cases; relative risk, 0.86; 95% CI, 0.76 to 0.97; P=0.01). There were also fewer death in the screening arm of the trial: 2.9 deaths from colorectal cancer per 10,000 person-years in the intervention group (252 deaths), as compared with 3.9 per 10,000 person-years in the usual-care group (341 deaths), which represents a 26% reduction (relative risk, 0.74; 95% CI, 0.63 to 0.87; P<0.001). Mortality from distal colorectal cancer was reduced by 50% (87 deaths in the intervention group vs. 175 in the usual-care group; relative risk, 0.50; 95% CI, 0.38 to 0.64; P<0.001); mortality from proximal colorectal cancer was unaffected (143 and 147 deaths, respectively; relative risk, 0.97; 95% CI, 0.77 to 1.22; P=0.81).
In this project, we intend to use a gradient-boosted decision tree algorithm (XGBoost) to create a phenotype profile of the patients who are most likely to benefit from CRC screening. Machine learning techniques can be leveraged to unravel clinical entities and relationship that have not been explored. We will use these techniques and train two predictive models for cancer-specific (CSS) and overall survival (OS) models. Class imbalance correction techniques will be used before training the models.
In a second step, we will also compare the performances of models trained on all patients vs a models trained on the subset of patients included in the screening arm.
Beyond prediction, we intend on providing interpretable models that explain how they make their prediction at the scale of the whole population or at the individual scale.
Aims
The main outcome of the project is to be able to create artificial intelligence models that can:
- predict benefit from CRC screening,
- predict CSS and OS after CRC diagnosis.
The models will be interpretable, meaning that we will provide the clinical features that explain each prediction. This is crucial to build a model that is clinically actionable.
Collaborators
Lei Xing, PhD, Jacob Haimson Professor of Medical Physics and Director of Medical Physics Division of Radiation Oncology Department, Stanford University
Related Publications
-
Development and validation of a model to predict survival in colorectal cancer using a gradient-boosted machine.
Bibault JE, Chang DT, Xing L
Gut. 2020 Sep 4 PUBMED