PLCO Colorectal Cohort Tables for Outcome Labeling and WSI Linkage to Support Risk Modeling
Principal Investigator
Name
Yuming Jiang
Degrees
MD, PhD
Institution
Wake Forest University Health Sciences
Position Title
Tenure Track Assistant Professor
Email
yuming.jiang@wfusm.edu
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCO-2006
Initial CDAS Request Approval
Jan 7, 2026
Title
PLCO Colorectal Cohort Tables for Outcome Labeling and WSI Linkage to Support Risk Modeling
Summary
We request participant-level PLCO colorectal tables to curate analysis cohorts, generate outcome labels, and link participants to colorectal H&E WSI requested in a companion PLCO Images project. Specifically, we will assemble baseline risk factors, screening and diagnostic procedures, pathology treatment summaries, and vital status with dates to construct de-identified, analysis-ready tables for risk modeling. Using deterministic keys provided by CDAS, we will create a linkage that enables patient-level modeling from WSI while keeping raw identifiers unavailable to analysts. Time-to-event datasets will be derived with transparent censoring rules and data dictionaries. We will not attempt re-identification and will report aggregate results only, consistent with CDAS policies. This Data-Only request does not include any imaging files; images are handled in the separate PLCO Images project.
Aims
Aim 1-Outcome construction and time-to-event datasets: We will curate the endpoints required for risk modeling and clinical evaluation. This includes the date of cancer incidence, the indicator for first primary cancer, stage and grade when available, the initial course of treatment with clearly documented timelines, vital status, and the date and cause of death. We will define index dates and consistent censoring rules, distinguish administrative censoring from loss to follow-up, and provide transparent derivation notes for every variable. Based on these rules, we will build patient-level time-to-event datasets for overall survival, cancer-specific survival, time to recurrence or progression when available, and time to second primary cancer. We will run completeness and consistency checks, ensure that timelines are logically ordered, summarize missingness patterns, and supply a concise data dictionary so that analysts can use these tables immediately.
Aim 2 -Cohort definition and deterministic linkage to WSI: We will establish a colorectal cohort using clear inclusion and exclusion criteria and will report counts and reasons at each selection step. Next, we will create a deterministic linkage between participants and whole-slide images requested in the companion Images project. Using stable CDAS identifiers together with slide-level metadata, we will align records at the patient level without exposing any identifying information. When a participant is associated with multiple slides, we will record a well-defined one-to-many relationship and provide grouping fields that support fold assignment and prevent information leakage. We will deliver the linkage table with quality flags, the definitions of index dates, and a documented, reproducible alignment procedure.
Aim 3-Covariate curation and analysis-ready tables: We will assemble baseline questionnaire variables, screening history, diagnostic work-up information, and pathology and treatment summaries that are relevant to modeling. We will harmonize coding schemes and units, generate prespecified derived features, and analyze missing data in a systematic way. We will retain indicators that preserve original missingness while also preparing versions that are convenient for subsequent imputation. Finally, we will produce analysis-ready CSV or SAS tables with stable column names and complete value labels, along with a versioned schema, a crosswalk from raw PLCO fields to analytic variables, and lightweight validation scripts that recheck ranges, logic, and constraints before modeling.
Collaborators
Yuming Jiang Wake Forest University School of Medicine
GUANNAN HE Wake Forest University School of Medicine
Yijun Chen Wake Forest University School of Medicine