Predicting eligibility for lung cancer screening

Principal Investigator

Name
Robert Volk

Degrees
Ph.D

Institution
University of Texas MD Anderson Cancer Center

Position Title
Professor

Email
bvolk@mdanderson.org

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCO-2018

Initial CDAS Request Approval
Jan 26, 2026

Title
Predicting eligibility for lung cancer screening

Summary
Lung cancer remains the leading cause of cancer-related mortality in the United States. Randomized trials have demonstrated that low-dose computed tomography (LDCT) screening reduces lung cancer mortality among individuals at high risk, defined primarily by age and smoking history. Accordingly, lung cancer screening (LCS) eligibility criteria used in major trials and adopted by professional organizations rely heavily on age, smoking status, and cumulative tobacco exposure measured in pack-years. In 2021, the U.S. Preventive Services Task Force (USPSTF) expanded LCS eligibility by lowering the minimum age from 55 to 50 years and reducing the smoking threshold from 30 to 20 pack-years, resulting in an estimated 81% increase in the screening-eligible population.
Despite these efforts, accurately identifying individuals eligible for LCS remains challenging. Information required to calculate pack-years—smoking duration and intensity—is frequently incomplete, inconsistently collected, or unreliable in both population surveys and clinical records. This limitation is particularly consequential in real-world settings where clinicians and health systems must make screening decisions with imperfect information. Moreover, smoking behaviors vary substantially by sex, age, race and ethnicity, and smoking frequency (daily versus nondaily), potentially leading to inequities in identifying screening-eligible individuals when rigid pack-year thresholds are applied.
Estimating LCS eligibility using readily available sociodemographic characteristics and smoking indicators may offer a pragmatic alternative for identifying populations likely to benefit from screening when detailed smoking histories are unavailable. This approach is especially relevant for population-level surveillance, health systems planning, and efforts to address disparities in screening access. Individuals with tobacco-related comorbidities, such as chronic obstructive pulmonary disease (COPD), may also have a higher likelihood of meeting LCS eligibility criteria, even when pack-year data are missing or uncertain.
In this project, we will use data from the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial to develop and evaluate a predictive model for lung cancer screening eligibility that relies on readily available sociodemographic characteristics, smoking status, smoking frequency, and indicators of tobacco-related health conditions. Model development will use machine learning–informed classification approaches. We will compare the performance of the developed model with current USPSTF eligibility criteria, which rely primarily on age and cumulative smoking exposure measured in pack-years. Comparisons will focus on the ability of each approach to identify individuals’ eligibility to lung cancer screening, overall and within key subgroups defined by sex, age, and racial/ethnic identification, using standard measures of discrimination, calibration, and agreement. This analysis will assess whether a simplified, data-driven approach based on commonly available information can approximate or improve identification of screening-eligible populations in settings where detailed smoking history is incomplete, thereby informing population-level surveillance, clinical decision-making, and health system planning for lung cancer screening.

Aims

• Specific Aim 1:
Develop and internally validate a predictive model for lung cancer screening eligibility using readily available sociodemographic characteristics, smoking status, smoking frequency, and indicators of tobacco-related health conditions, using machine learning.
• Specific Aim 2:
Compare the performance of the developed predictive model with current USPSTF eligibility criteria in identifying individuals eligible for lung cancer screening, using measures of discrimination, calibration, and classification accuracy.
• Specific Aim 3:
Evaluate and compare model performance across population subgroups to identify whether the predictive model improves identification of screening-eligible individuals among groups that may be misclassified or underserved by pack-year–based eligibility criteria.

Collaborators

Robert Volk University of Texas MD Anderson Cancer Center
Sara Nofal University of Texas MD Anderson Cancer Center