Skip to Main Content
An official website of the United States government
Scheduled Maintenance: CDAS will be undergoing maintenance on Saturday, March 7th from 7:30 AM to 9:30 AM U.S. Eastern Standard Time for updates to the website. During this time you may experience intermittent downtime.

Development of ML-Based Early Cancer Confirmation and Stage Classification Models for Ovarian and Colorectal Cancer Using PLCO Trial Data

Principal Investigator

Name
Dr. Mehdi Hasan Chowdhury

Degrees
BSc, MSc, PhD

Institution
Chittagong University of Engineering & Technology (CUET)

Position Title
Professor

Email
mhchowdhury@cuet.ac.bd

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-2026

Initial CDAS Request Approval
Mar 4, 2026

Title
Development of ML-Based Early Cancer Confirmation and Stage Classification Models for Ovarian and Colorectal Cancer Using PLCO Trial Data

Summary
The goal of this project is to develop predictive models using the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) dataset to confirm the presence of ovarian and colorectal cancer using yes or no classification at an early stage based on available demographic, clinical, screening, and biomarker-related variables, and to classify confirmed cancer cases into clinically relevant stages at diagnosis.

The proposed system will integrate multimodal data, including structured numeric variables such as age and screening results, text-based clinical information where available, and relevant medical imaging data accessible through PLCO resources. This dual modeling framework will combine independent prediction systems, one based on numerical-textual data and another based on imaging data, to leverage complementary strengths from each data modality.

By utilizing multimodal information from a large, well-characterized, population-based screening cohort, the project aims to enhance predictive accuracy, robustness, and clinical relevance compared to traditional single-modality approaches. Ultimately, the developed system is intended to support healthcare professionals with an advanced risk confirmation and stage classification tool for ovarian and colorectal cancers, facilitating improved risk stratification, personalized treatment planning, and potentially improving patient outcomes.

Aims

1. To develop a robust predictive model for the early confirmation (yes/no) of ovarian and colorectal cancer using relevant participant-level variables from the PLCO dataset.

2. To develop a reliable stage classification model among confirmed ovarian and colorectal cancer cases to accurately categorize disease stage at diagnosis.

3. To design and implement a structured user interface that enables entry of patient-level inputs and provides clear, user-friendly prediction outputs, including cancer confirmation probability and stage classification results.

4. To develop an interactive chatbot module integrated within the system to assist users in data entry, guide interpretation of model outputs, and provide conversational support for understanding risk and stage predictions.

5. To incorporate SHapley Additive exPlanations (SHAP) to enhance model interpretability by identifying and quantifying key predictors influencing cancer confirmation and stage classification outcomes at both global and individual levels.

Collaborators

Sadia Tabassum Chittagong University of Engineering & Technology
Mehdi Hasan Chowdhury Chittagong University of Engineering & Technology