Can we predict early biochemical response after chemotherapy in Stage IV colon cancer patients using baseline clinical and treatment features
Principal Investigator
Name
Dr. Johnson P. Thomas
Degrees
Ph.D
Institution
Oklahoma State University
Position Title
Professor
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-1994
Initial CDAS Request Approval
Nov 24, 2025
Title
Can we predict early biochemical response after chemotherapy in Stage IV colon cancer patients using baseline clinical and treatment features
Summary
Our goal in this research is to apply Machine Learning algorithms to the PLCO dataset to answer two open problems which have not been investigated in depth.
1. Firstly, why does Chemotherapy not work for everyone? Who will respond to treatment? Who will not? Can we predict using Machine Learning if the chemotherapy will work and to go a step further, can we personalize the treatment? In other words, identify those for whom it may work.
2. Why do some people get serious side effects with chemotherapy, whereas others seem to have minimal or no or acceptable levels of side effects. The side effects of chemo are unpredictable. Most chemo is dosed with BSA (Body Surface Area) in mg/m2. One drug may have a 50% rate of one side effect, 25% of another. The dose for the same cancer for two similar people may be the same, but the toxicities can vary. Age, gender, cancer type, enzyme deficiencies, location of cancer etc. may all contribute. Can we predict the side effects? Can we personalize it?
In particular, we will look at these two questions in relation to stage IV colon cancer which is a common cancer. It is therefore important to investigate this. There have been few studies that have attempted to answer these two questions in relation to stage IV colon cancer. Clinical studies take an inordinate amount of time and cost. Data science can play a crucial role in initially answering these questions in a time effective and cost-efficient manner.
The PLCO dataset contains carcinoembryonic antigen (CEA), White Blood Cell (WBC), Hemoglobin (Hb), Creatinine (Cr), Bilirubin (Bl), Serum glutamic pyruvic transaminase (SGPT), Blood Urea Nitrogen (BUN) from lab results, demographics, along with Chemotherapy information from a large cohort of patients. This dataset would therefore be a good fit for machine learning prediction and analysis and would go some way to answering these questions.
Aims
1. Can longitudinal laboratory markers (CEA, WBC, Hemoglobin), combined with treatment exposure (chemotherapy vs. non-chemotherapy, dosage intensity, and concurrent medications), accurately predict mortality risk within defined survival windows (6–12 months, 1–2 years, 2–5 years) in stage IV colon cancer patients diagnosed since 2019?
2. The aim of this study is to develop and validate predictive models that integrate demographics, chemotherapy exposure (frequency and dosage), other medication histories, and dynamic laboratory profiles (CEA, WBC, Hemoglobin) to estimate individualized survival probabilities and mortality risk in stage IV colon cancer patients. The goal is to move beyond binary “chemo vs. no-chemo” comparisons and instead quantify how survival trajectories differ across patient subgroups defined by treatment intensity and lab dynamics.
Collaborators
Ipsita Ghosh Ipsita Ghosh