NLST Datasets

The following NLST dataset(s) are available for delivery on CDAS. For each dataset, a Data Dictionary that describes the data is publicly available. In order to obtain the actual data in SAS or CSV format, you must begin a new NLST project. Data will be delivered once the project is approved and data transfer agreements are completed.

A subset of clinical data is available publicly from Imaging Data Commons (IDC) and The Cancer Imaging Archive (TCIA). You can learn how to access the open component of NLST clinical data in IDC by following this tutorial.

You may also access the complete list of data collection forms used to collect NLST data.

Datasets and Data Dictionaries

Files	Description
Data Dictionary (PDF - 382.2 KB)	1. The Participant dataset is a comprehensive dataset that contains all the NLST study data needed for most analyses of lung cancer screening, incidence, and mortality. The dataset contains one record for each of the ~53,500 participants in NLST.
Data Dictionary (PDF - 82.7 KB)	2. The Spiral CT Screening dataset (~75,100, one record per CT screen) contains information from the Spiral CT screening exams. This includes technical parameters, reconstruction filter(s), reader ID, and recommendations for diagnostic follow-up.
Data Dictionary (PDF - 75.5 KB)	3. The Chest X-Ray Screening dataset (~73,500, one record per X-Ray screen) contains information from the Chest X-Ray screening exams. This includes technical parameters, reader ID, and recommendations for diagnostic follow-up.
Data Dictionary (PDF - 53.6 KB)	4. The Spiral CT Abnormalities dataset (~177,500, one record per abnormality on CT) contains information about each abnormality observed on the Spiral CT screening exams.
Data Dictionary (PDF - 50.5 KB)	5. The Chest X-Ray Abnormalities dataset (~47,200, one record per abnormality on X-Ray) contains information about each abnormality observed on the Chest X-Ray screening exams.
Data Dictionary (PDF - 51.0 KB)	6. The Spiral CT Comparison Read Abnormalities dataset (~31,000, one record per abnormality on CT) contains information about two types of abnormalities observed on the comparison read of CT exams: (a) all non-calcified nodules / masses >= 4mm in diameter; (b) other abnormalities deemed significant by the radiologist. Information about change in size and attenuation is available.
Data Dictionary (PDF - 51.0 KB)	7. The Chest X-Ray Comparison Read Abnormalities dataset (~5,200, one record per abnormality on X-Ray) contains information about two types of abnormalities observed on the comparison read of X-rays: (a) all non-calcified nodules / masses; (b) other abnormalities deemed significant by the radiologist. Information about change in size and attenuation is available.
Data Dictionary (PDF - 52.4 KB)	8. The Diagnostic Procedures dataset (~60,900, one record per diagnostic procedure) contains information on: (a) diagnostic procedures prompted by a positive screening exam (i.e. suspicious for lung cancer), and (b) diagnostic / staging procedures associated with any lung cancer diagnosed during the trial.
Data Dictionary (PDF - 52.9 KB)	9. The Medical Complications dataset (~800, one record per medical complication) contains information about complications related to diagnostic evaluation performed in response to a positive screening exam or in diagnosing lung cancer at any time during the trial.
Data Dictionary (PDF - 87.4 KB)	10. The Lung Cancer dataset (~2,100, one record per lung cancer) contains information about each lung cancer diagnosed during the trial, including multiple primary tumors in the same individual. It focuses on characteristics of the cancer, including information not available in the Participant dataset.
Data Dictionary (PDF - 47.8 KB)	11. The Treatment dataset (~4,600, one record per treatment procedure) contains information about procedures received in the initial course of treatment for lung cancer.
Data Dictionary (PDF - 43.7 KB)	12. The Cause of Death dataset (~15,200, one record per cause of death/other condition) contains information on all conditions listed on the death certificate and the cause of death from the endpoint verification process.
Data Dictionary (PDF - 42.6 KB)	13. The LSS Non-cancer Condition dataset (~10,900, one record per condition) contains information on non-cancer conditions diagnosed near the time of lung cancer diagnosis or of diagnostic evaluation for lung cancer following a positive screening exam. These data have serious limitations for most analyses; they were collected only on a subset of study participants during limited time windows, and they may not be comprehensive even within those windows because these data were not a primary focus of collection.
Data Dictionary (PDF - 43.0 KB)	14. The ACRIN Non-lung-cancer Condition dataset (~3,400, one record per condition) contains information on non-lung-cancer conditions diagnosed near the time of lung cancer diagnosis or of diagnostic evaluation for lung cancer following a positive screening exam. These data have serious limitations for most analyses; they were collected only on a subset of study participants during limited time windows, and they may not be comprehensive even within those windows because these data were not a primary focus of collection.
Data Dictionary (PDF - 67.8 KB)	15. The LSS HAQ dataset (~3,200, one record per survey form) contains data from an annual survey of a random sample of LSS participants about medical procedures received over the previous year. The main purpose of the survey was to learn about spiral CT and chest x-ray exams received to calculate how often spiral CT screening was being used by participants in the x-ray arm and vice versa.
Data Dictionary (PDF - 79.0 KB)	16. The Spiral CT Image Information dataset (~203,000, one record per SCT image series) contains information on the technical parameters of the CT scanner recorded during the image collection. The dataset also provides a means to link SCT image files to participants and where those images are batched in either a hard drive delivery or Lung Cancer Selection download.
Data Dictionary (PDF - 98.9 KB)	17. The Pathology Image dataset (~1,250, one record per pathology image) contains data pertaining to the tissue block as well of the regions of interest indicated in the pathology report. The dataset also provides a means to link pathology image files to participants and where those images are batched in either a hard drive delivery or Standard Pathology Selection download.

User Guides and Other Files

User Guides are intended to serve as a guide to using the data contained in these datasets.

For NLST:

NLST User Guide
(PDF - 211.0 KB)