Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Jeffrey T. Leek
Degrees
Ph.D.
Institution
Fred Hutchinson Cancer Center
Position Title
Vice President, Chief Data Officer, and J. Orin Edson Professor in Public Health Sciences
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-1468
Initial CDAS Request Approval
Sep 9, 2025
Title
Early-Stage Inference on Cancer Survival Rates in the NLST Trial
Summary
For cancer screening trials, researchers often seek to draw some early conclusions before the completion of full trial studies. Doing so comes at the cost of missing many outcomes from enrolled patients and a lack of statistical validity. To address the issue of missing outcomes, advanced machine learning (ML) models can provide low-cost predictions for imputation. Nevertheless, additional calibrations on the predicted outcomes are often required to ensure statistical validity. The primary goal of this project is to study valid statistical inference procedures on survival outcomes when pre-trained machine learning models can impute those missing survival outcomes. We formulate a novel ML-assisted estimator for any-time valid inference on the cancer survival outcome and its related statistical quantities. The asymptotic properties and optimality conditions for our proposed estimator will be rigorously proven, and its empirical performance will be compared with that of other related ML-assisted estimators in the literature through simulation studies. Finally, we plan to apply our proposed estimator to the NLST study and derive meaningful insights on the cancer survival rate in this trial.
Aims

- Apply state-of-the-art machine learning (ML) models for predicting the survival time or status of patients in screening trials with long follow up and high censoring, such as the NLST study.
- Formulate an ML-assisted estimator for valid inference on the cancer survival probability and restricted mean survival time.
- Derive the asymptotic properties and discuss the optimality of our proposed estimator.
- Show the finite-sample performance of our proposed estimator through simulation studies and compare it with the existing ML-assisted estimators in the literature.
- Apply and validate our proposed estimator using the cancer data from the NLST study.

Collaborators

Yiqing Zhao, Ph.D., Fred Hutchinson Cancer Center (yqzhao@fredhutch.org)
Stephen Salerno, Ph.D., Fred Hutchinson Cancer Center (ssalerno@fredhutch.org)
Awan Afiaz, M.S., Fred Hutchinson Cancer Center, University of Washington (aafiaz@uw.edu) (NLST-1455)
Yikun Zhang, B.S., University of Washington (yikun@uw.edu) (NLST-1455)
Kentaro Hoffman, Ph.D., University of Washington (khoffm3@uw.edu) (NLST-1455)
Yen-Chi Chen (yenchic@uw.edu) (NLST-1455)