Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Principal Investigator
Name
Li Cheung
Degrees
PhD
Institution
National Cancer Institute
Position Title
Principal Investigator
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-1478
Initial CDAS Request Approval
Sep 29, 2025
Title
Comparison of Recurrent Neural Network (RNN) versus Survival Analysis for Lung Cancer Risk Prediction
Summary
Recurrent Neural Network (RNN) have been used to analyze text, speech, and time series data. We would like to examine the utility of RNN for cancer prediction. In this project, we will evaluate RNN for lung cancer risk prediction using common baseline characteristics (i.e., age, gender, education, race/ethnicity, mmi, family history of lung cancer, smoking exposure (cigs per day, years smoked, years quit), and emphysema) and with/without time-varying low-dose CT results (i.e., timing of result relative to the baseline visit and whether the result was a false positive or a true negative). The performance of RNN-based prediction will be compared against standard survival analysis (e.g., Cox models) approaches using the same set of predictors, with evaluation by model discrimination (AUC) and calibration (observed vs. expected).
Aims

In this project, we plan to:
1. Conduct simulation studies to examine the statistical properties of RNN-based prediction. To ensure that these simulation studies are realistic, we will create simulation datasets that will have similar covariates and event time distributions as the real NLST data.
2. Using 5-fold cross-validation in the NLST X-ray arm, we will fit an RNN model as well as traditional survival models (such as the Cox model described in Katki et al., JAMA 2016) to the training subsets using only baseline characteristics as predictors. We will then validate by AUC and by calibration plots of observed vs. expected (O/E) using the validation subsets.
3. Using 5-fold cross-validation in the NLST low-dose CT arm, we will fit an RNN model as well as traditional survival models to the training subsets using baseline characteristics and CT results as predictors. We will then validate by AUC and by calibration plots of observed vs. expected ratios (O/E) using the validation subsets.

Collaborators

Li Cheung National Cancer Institute
Qing Pan George Washington University
Guannan Chen George Washington University