Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Wendy Leith
Degrees
MPH, MS, BA
Institution
Forensic Research & Analysis
Position Title
Statistician
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-195
Initial CDAS Request Approval
Mar 8, 2016
Title
Lung Cancer Among Limited-Duration Former Smokers: A Predictive Model
Summary
One of the greatest challenges in tobacco litigation is the estimation of the risk of lung cancer had an individual quit smoking many years, or even decades before when they actually quit. Experts currently struggle to provide informed and accurate estimates to address these hypothetical scenarios. Development of a predictive model that incorporates age at initiation, as well as smoking duration and intensity, among former smokers with a limited-duration of smoking exposure could improve the accuracy of estimates related to the risk of lung cancer being routinely and subjectively inferred based on models that may or may not accurately represent these individuals in many cases. We seek to develop a prediction tool that can accurately estimate the risk of developing lung cancer in this relatively understudied subpopulation of former smokers.

Standard predictive analytics techniques will be utilized to build the most accurate yet parsimonious models for predicting 1- and 5-year lung cancer occurrence among former smokers with less than 30 pack years of exposure who quit at least 15 years prior to enrollment in the study. The PLCO Lung data set will be randomly split 75%/25% for model development and validation. All variables available in the data set will be considered for inclusion in the model. Univariate associations will be evaluated with chi-square tests for categorical variables, and t-tests for continuous variables. Additionally, continuous variables will be assessed for linearity of the logit, and transformations will be employed as necessary. Logistic models will be developed using stepwise regression. Fit will be assessed via the deviance chi-square and Hosmer-Lemeshow goodness of fit tests. Predictive accuracy will be evaluated by the area under the Receiver Operating Characteristic curve (AUC). The 25% reserved sample will be used to validate the final model via the AUC, comparison of the observed versus the predicted cases via positive and negative predictive value, and the prediction error rate.
Aims

To develop and test a model for predicting lung cancer in former smokers with limited-duration exposure.

Collaborators

Michael Freeman
Laura Sangare