Skip to Main Content
An official website of the United States government
Scheduled Maintenance: CDAS will be undergoing maintenance on Saturday, April 4th from 7:30 AM to 9:30 AM U.S. Eastern Standard Time for updates to the website. During this time you may experience intermittent downtime.

PLCO Digital Pathology Whole-Slide Images for Multimodal AI Modeling and Validation

Principal Investigator

Name
Yuming Jiang

Degrees
M.D., Ph.D.

Institution
Wake Forest University Health Sciences

Position Title
Tenure Track Assistant Professor

Email
yuming.jiang@wfusm.edu

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCOI-2007

Initial CDAS Request Approval
Jan 7, 2026

Title
PLCO Digital Pathology Whole-Slide Images for Multimodal AI Modeling and Validation

Summary
We request access to PLCO colorectal H&E whole-slide images (WSI) to support the WSI branch of our multimodal program. In this CDAS Images project we will use WSI only and link slides at the participant level to phenotype tables requested in a companion Data-Only project. WSIs will be processed with our validated pipeline: tissue masking and tiling, Macenko stain normalization, artifact QC, HoverNet nuclei segmentation/classification, and CellProfiler-derived nuclear features; in parallel we extract foundation-model WSI embeddings with attention-based MIL to form patient-level representations for risk modeling. We evaluate with time-to-event and classification metrics (C-index, time-dependent AUC, Brier) plus probability calibration and decision-curve analysis, reporting aggregate results only. No re-identification will be attempted and WSIs will not be redistributed. Data are stored on institution-approved secure servers (encrypted at rest; access limited to Approved Users). Findings will be submitted for peer-reviewed dissemination consistent with CDAS policies.

Aims

Aim 1 : Harmonized WSI preprocessing .Standardize tissue masking& tiling & stain normalization&artifact QC under versioned configs.

Aim 2 : Cell-level phenotyping & interpretable features. We will implement the method for extracting features from cell-level image and do some post processing . Our deliverables include feature matrices with data dictionaries; match-rate QA between nuclei and tiles.

Aim 3: We will do patient-level WSI representation & risk modeling. Extract WSI foundation-model embeddings + MIL and fuse with other features in a patient representation. Train WSI-only prognostic models using PLCO outcomes. Report C-index and AUC, and decision-curve analysis at preregistered thresholds. Our deliverables will include model cards, evaluation notebooks, aggregate-only results.

Aim 4: We will test and valid robustness & external generalizability.Under the same preprocessing, evaluate models on TCGA-CRC and MCO as external cohorts, analyze domain shift, run subgroup audits on different stage and sites, and perform sensitivity checks . We will deliver external-test reports with reliability plots.

Aim 5: Store all data on institution-approved secure servers . No re-identification or redistribution. Maintain audit logs, honor the 12-month access window, and perform project close-out & certified data destruction per CDAS policy.

Collaborators

Yuming Jiang Wake Forest University School of Medicine
GUANNAN HE Wake Forest University School of Medicine
Yijun Chen Wake Forest University School of Medicine