Comparison of Recurrent Neural Network (RNN) versus Survival Analysis for Lung Cancer Risk Prediction

Principal Investigator

Name
Li Cheung

Degrees
PhD

Institution
National Cancer Institute

Position Title
Principal Investigator

Email
li.cheung@nih.gov

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1478

Initial CDAS Request Approval
Sep 29, 2025

Title
Comparison of Recurrent Neural Network (RNN) versus Survival Analysis for Lung Cancer Risk Prediction

Summary
Recurrent Neural Network (RNN) have been used to analyze text, speech, and time series data. We would like to examine the utility of RNN for cancer prediction. In this project, we will evaluate RNN for lung cancer risk prediction using common baseline characteristics (i.e., age, gender, education, race/ethnicity, mmi, family history of lung cancer, smoking exposure (cigs per day, years smoked, years quit), and emphysema) and with/without time-varying low-dose CT results (i.e., timing of result relative to the baseline visit and whether the result was a false positive or a true negative). The performance of RNN-based prediction will be compared against standard survival analysis (e.g., Cox models) approaches using the same set of predictors, with evaluation by model discrimination (AUC) and calibration (observed vs. expected).

Aims

In this project, we plan to:
1. Conduct simulation studies to examine the statistical properties of RNN-based prediction. To ensure that these simulation studies are realistic, we will create simulation datasets that will have similar covariates and event time distributions as the real NLST data.
2. Using 5-fold cross-validation in the NLST X-ray arm, we will fit an RNN model as well as traditional survival models (such as the Cox model described in Katki et al., JAMA 2016) to the training subsets using only baseline characteristics as predictors. We will then validate by AUC and by calibration plots of observed vs. expected (O/E) using the validation subsets.
3. Using 5-fold cross-validation in the NLST low-dose CT arm, we will fit an RNN model as well as traditional survival models to the training subsets using baseline characteristics and CT results as predictors. We will then validate by AUC and by calibration plots of observed vs. expected ratios (O/E) using the validation subsets.

Collaborators

Li Cheung National Cancer Institute
Qing Pan George Washington University
Guannan Chen George Washington University