Lung Cancer Among Limited-Duration Former Smokers: A Predictive Model
            Principal Investigator
        
        Name
            Wendy Leith
            
                Degrees
                MPH, MS, BA
            
            
                Institution
                Forensic Research & Analysis
            
            
                Position Title
                Statistician
            
            
                Email
                
                
            
        
            About this CDAS Project
        
        Study
            
                PLCO
                (Learn more about this study)
            
            
            
                Project ID
                
                    
                        PLCO-195
                    
                
            
            
                Initial CDAS Request Approval
                Mar 8, 2016
            
            Title
            Lung Cancer Among Limited-Duration Former Smokers: A Predictive Model
            
                Summary
                One of the greatest challenges in tobacco litigation is the estimation of the risk of lung cancer had an individual quit smoking many years, or even decades before when they actually quit. Experts currently struggle to provide informed and accurate estimates to address these hypothetical scenarios. Development of a predictive model that incorporates age at initiation, as well as smoking duration and intensity, among former smokers with a limited-duration of smoking exposure could improve the accuracy of estimates related to the risk of lung cancer being routinely and subjectively inferred based on models that may or may not accurately represent these individuals in many cases. We seek to develop a prediction tool that can accurately estimate the risk of developing lung cancer in this relatively understudied subpopulation of former smokers.
Standard predictive analytics techniques will be utilized to build the most accurate yet parsimonious models for predicting 1- and 5-year lung cancer occurrence among former smokers with less than 30 pack years of exposure who quit at least 15 years prior to enrollment in the study. The PLCO Lung data set will be randomly split 75%/25% for model development and validation. All variables available in the data set will be considered for inclusion in the model. Univariate associations will be evaluated with chi-square tests for categorical variables, and t-tests for continuous variables. Additionally, continuous variables will be assessed for linearity of the logit, and transformations will be employed as necessary. Logistic models will be developed using stepwise regression. Fit will be assessed via the deviance chi-square and Hosmer-Lemeshow goodness of fit tests. Predictive accuracy will be evaluated by the area under the Receiver Operating Characteristic curve (AUC). The 25% reserved sample will be used to validate the final model via the AUC, comparison of the observed versus the predicted cases via positive and negative predictive value, and the prediction error rate.
            
            
                Standard predictive analytics techniques will be utilized to build the most accurate yet parsimonious models for predicting 1- and 5-year lung cancer occurrence among former smokers with less than 30 pack years of exposure who quit at least 15 years prior to enrollment in the study. The PLCO Lung data set will be randomly split 75%/25% for model development and validation. All variables available in the data set will be considered for inclusion in the model. Univariate associations will be evaluated with chi-square tests for categorical variables, and t-tests for continuous variables. Additionally, continuous variables will be assessed for linearity of the logit, and transformations will be employed as necessary. Logistic models will be developed using stepwise regression. Fit will be assessed via the deviance chi-square and Hosmer-Lemeshow goodness of fit tests. Predictive accuracy will be evaluated by the area under the Receiver Operating Characteristic curve (AUC). The 25% reserved sample will be used to validate the final model via the AUC, comparison of the observed versus the predicted cases via positive and negative predictive value, and the prediction error rate.
Aims
                To develop and test a model for predicting lung cancer in former smokers with limited-duration exposure.
Collaborators
                
                Michael Freeman
Laura Sangare
