PLCO Digital Pathology Whole-Slide Images for Multimodal AI Modeling and Validation
Principal Investigator
Name
Yuming Jiang
Degrees
M.D., Ph.D.
Institution
Wake Forest University Health Sciences
Position Title
Tenure Track Assistant Professor
Email
yuming.jiang@wfusm.edu
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCOI-2007
Initial CDAS Request Approval
Jan 7, 2026
Title
PLCO Digital Pathology Whole-Slide Images for Multimodal AI Modeling and Validation
Summary
We request access to PLCO colorectal H&E whole-slide images (WSI) to support the WSI branch of our multimodal program. In this CDAS Images project we will use WSI only and link slides at the participant level to phenotype tables requested in a companion Data-Only project. WSIs will be processed with our validated pipeline: tissue masking and tiling, Macenko stain normalization, artifact QC, HoverNet nuclei segmentation/classification, and CellProfiler-derived nuclear features; in parallel we extract foundation-model WSI embeddings with attention-based MIL to form patient-level representations for risk modeling. We evaluate with time-to-event and classification metrics (C-index, time-dependent AUC, Brier) plus probability calibration and decision-curve analysis, reporting aggregate results only. No re-identification will be attempted and WSIs will not be redistributed. Data are stored on institution-approved secure servers (encrypted at rest; access limited to Approved Users). Findings will be submitted for peer-reviewed dissemination consistent with CDAS policies.
Aims
Aim 1 : Harmonized WSI preprocessing .Standardize tissue masking& tiling & stain normalization&artifact QC under versioned configs.
Aim 2 : Cell-level phenotyping & interpretable features. We will implement the method for extracting features from cell-level image and do some post processing . Our deliverables include feature matrices with data dictionaries; match-rate QA between nuclei and tiles.
Aim 3: We will do patient-level WSI representation & risk modeling. Extract WSI foundation-model embeddings + MIL and fuse with other features in a patient representation. Train WSI-only prognostic models using PLCO outcomes. Report C-index and AUC, and decision-curve analysis at preregistered thresholds. Our deliverables will include model cards, evaluation notebooks, aggregate-only results.
Aim 4: We will test and valid robustness & external generalizability.Under the same preprocessing, evaluate models on TCGA-CRC and MCO as external cohorts, analyze domain shift, run subgroup audits on different stage and sites, and perform sensitivity checks . We will deliver external-test reports with reliability plots.
Aim 5: Store all data on institution-approved secure servers . No re-identification or redistribution. Maintain audit logs, honor the 12-month access window, and perform project close-out & certified data destruction per CDAS policy.
Collaborators
Yuming Jiang Wake Forest University School of Medicine
GUANNAN HE Wake Forest University School of Medicine
Yijun Chen Wake Forest University School of Medicine