Deep learning–based survival prediction in lung cancer using NLST clinical data and imaging-derived biomarkers

Principal Investigator

Name
Yuming Jiang

Degrees
MD, PhD

Institution
Wake Forest University Health Sciences

Position Title
Tenure Track Assistant Professor

Email
yuming.jiang@wfusm.edu

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1489

Initial CDAS Request Approval
Jan 7, 2026

Title
Deep learning–based survival prediction in lung cancer using NLST clinical data and imaging-derived biomarkers

Summary
We propose to use NLST clinical data to develop and validate prognostic models for lung cancer outcomes. Our research focuses on integrating established clinical risk factors with quantitative imaging- and histopathology-derived biomarkers from digitized tissue slides and computational pathology pipelines. Specifically, we will build patient-level risk prediction models for survival-related endpoints by combining clinical variables with machine-learning representations extracted from tissue microenvironment patterns. We will conduct rigorous training&validation&testing with appropriate censoring-aware survival methods, evaluate performance using concordance index and time-dependent metrics, and assess clinical utility through risk stratification analyses. The requested NLST clinical data are essential to provide ground-truth outcomes and standardized clinical covariates for model development, adjustment, and external validation. All analyses will be performed on secure institutional computing resources, and only de-identified, aggregate results will be reported.

Aims

Aim 1: Curate, clean, and harmonize NLST clinical variables relevant to lung cancer prognosis. We will standardize variable definitions and coding, address missingness and data quality issues, and construct analysis-ready time-to-event outcomes. We will create consistent survival endpoints with clearly defined event indicators and follow-up time, and document the resulting data dictionary and cohort selection criteria to ensure reproducibility.

Aim 2: Develop and train survival prediction models using NLST clinical covariates. We will establish strong clinical baseline models using standard survival analysis methods and modern machine-learning survival approaches. We will perform feature selection and model comparison to identify the most informative clinical predictors, quantify their relative contributions, and produce interpretable risk scores that can stratify patients into clinically meaningful risk groups.

Aim 3: Integrate NLST clinical variables with computational pathology and imaging-derived biomarkers to improve risk prediction. We will combine clinical risk factors with quantitative features extracted from histopathology and tissue microenvironment patterns, including cell-level spatial organization captured by graph representations. We will conduct systematic ablation experiments to evaluate how each modality contributes to prediction performance and to determine whether multimodal integration provides consistent gains over clinical-only models.

Aim 4: Evaluate model robustness, generalizability, and potential clinical utility. We will use internal validation strategies, assess discrimination and calibration, and evaluate risk stratification performance. We will perform subgroup analyses to examine model behavior across key demographic and smoking-related strata, and conduct sensitivity analyses to ensure results are stable under different endpoint definitions and missing-data handling strategies.

Collaborators

Yuming Jiang Wake Forest University School of Medicine
GUANNAN HE Wake Forest University School of Medicine
Yijun Chen Wake Forest University School of Medicine