Skip to Main Content

An official website of the United States government

About this Publication
Title
Development and validation of a model to predict survival in colorectal cancer using a gradient-boosted machine.
Pubmed ID
32887732 (View this publication on the PubMed website)
Digital Object Identifier
Publication
Gut. 2020 Sep 4
Authors
Bibault JE, Chang DT, Xing L
Affiliations
  • Radiation Oncology, Stanford Medicine, Stanford, California, USA jbibault@stanford.edu.
  • Radiation Oncology, Stanford Medicine, Stanford, California, USA.
Abstract

OBJECTIVE: The success of treatment planning relies critically on our ability to predict the potential benefit of a therapy. In colorectal cancer (CRC), several nomograms are available to predict different outcomes based on the use of tumour specific features. Our objective is to provide an accurate and explainable prediction of the risk to die within 10 years after CRC diagnosis, by incorporating the tumour features and the patient medical and demographic information.

DESIGN: In the prostate, lung, colorectal and ovarian cancer screening (PLCO) Trial, participants (n=154 900) were randomised to screening with flexible sigmoidoscopy, with a repeat screening at 3 or 5 years, or to usual care. We selected patients who were diagnosed with CRC during the follow-up to train a gradient-boosted model to predict the risk to die within 10 years after CRC diagnosis. Using Shapley values, we determined the 20 most relevant features and provided explanation to prediction.

RESULTS: During the follow-up, 2359 patients were diagnosed with CRC. Median follow-up was 16.8 years (14.4-18.9) for mortality. In total, 686 patients (29%) died from CRC during the follow-up. The dataset was randomly split into a training (n=1887) and a testing (n=472) dataset. The area under the receiver operating characteristic was 0.84 (±0.04) and accuracy was 0.83 (±0.04) with a 0.5 classification threshold. The model is available online for research use.

CONCLUSIONS: We trained and validated a model with prospective data from a large multicentre cohort of patients. The model has high predictive performances at the individual scale. It could be used to discuss treatment strategies.

Related CDAS Studies
Related CDAS Projects