Skip to Main Content

An official website of the United States government

Principal Investigator
Travis Osterman
Vanderbilt University Medical Center
Position Title
Clinical Fellow, Division of Hematology and Oncology
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Nov 2, 2015
Improving Lung Cancer Screening Rates and Smoking Documentation through Targeted Electronic Clinical Decision Support
Lung cancer is the leading cause of cancer-related mortality worldwide with 1.2 million deaths annually and an estimated 221,200 new cases in 2015. More than half of patients present with distant disease which contributes to the high mortality rate. Based on the recent National Lung Cancer Screening Trial (NLST) the United States Preventative Services Task Force (USPSTF) recommends annual screening for patients age 55-80 that have 30 or more pack-years (PY) of tobacco use who have smoked in the past 15 years.

Clinical decision support (CDS) has been successfully used to help identify patients for screening of other diseases including breast, cervical, and colon cancer screening. Identifying PY quantity from narrative text in the electronic health record (EHR) is challenging, and is a focus of our current research. We have developed a natural language processing (NLP) algorithm to quantify PY tobacco exposure from unstructured clinical notes. We propose to formalize our NLP algorithm into a CDS module in the primary care setting.

Our preliminary data show that approximately 30% of smokers documented in the Vanderbilt University Medical Center (VUMC) EHR are missing either rate of smoking or duration of smoking. We use the NLST dataset to develop an imputation model to estimate the PY history of patients missing either the rate of smoking (usually in packs-per-day) or the duration of smoking. We will validate those models using known smoking histories from the VUMC. Based on those models, we will predict patients that would likely qualify for screening but have inadequate documentation.

We have VUMC Institutional Board Review approval (#151490) to implement a prospective clinical to test our CDS module in the internal medicine clinic. We will measure screening referral rates before and after CDS module implementation. Additionally, we will measure the change in provider smoking documentation in the EHR. The CDS module will be made publicly available at the conclusion of the study so that other medical centers may implement this into their clinic workflow.

The NLST data is needed to achieve aim #2 included below

Our preliminary data show that in the Vanderbilt University Medical Center (VUMC) population only 30% of all smokers and 50% of smokers in the screening-appropriate age are missing either rate or duration of smoking (and do not have total pack-years documented). In order to address this, we will create two models to predict pack-year (PY) history based on either 1) age and duration of smoking or 2) age and rate of smoking.

We propose to use the NLST participant dataset (cen,dataset_version,elig,ineligible,pid,age,gender,age_quit,cigsmok,pkyr,smokeage,smokeday,smokeyr) as the training set to create models to impute rate of smoker or duration of smoking given the available information. Several models will be created using multiple imputation, logistic regression, support vector machines, random forest, and other machine learning methods and tested using 10-fold cross validation.

The best performing model will be tested using VUMC smoking data that has been manually extracted from the EHR. We will compare the imputed data against the data hidden from the mode and report a Cohen kappa and adjusted kappa. We will also calculate use the exact binomial test to calculate a p-value and confidence intervals for the portion of individuals that the imputation models correctly classify as above or below 30 pack year history screening cut off.

If successful, these modules will be used in the final CDS module and tested in the clinical setting to alert primary care providers when patients meet NLST screening criteria.


Pierre Massion, M.D., Vanderbilt University Medical Center

Related Publications