Validation of a Deep Learning Algorithm for the Detection of Malignant Pulmonary Nodules in Chest Radiographs.
Yoo H, Kim KH, Singh R, Digumarthy SR, Kalra MK
IMPORTANCE: The improvement of pulmonary nodule detection, which is a challenging task when using chest radiographs, may help to elevate the role of chest radiographs for the diagnosis of lung cancer.
OBJECTIVE: To assess the performance of a deep learning-based nodule detection algorithm for the detection of lung cancer on chest radiographs from participants in the National Lung Screening Trial (NLST).
DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study used data from participants in the NLST ro assess the performance of a deep learning-based artificial intelligence (AI) algorithm for the detection of pulmonary nodules and lung cancer on chest radiographs using separate training (in-house) and validation (NLST) data sets. Baseline (T0) posteroanterior chest radiographs from 5485 participants (full T0 data set) were used to assess lung cancer detection performance, and a subset of 577 of these images (nodule data set) were used to assess nodule detection performance. Participants aged 55 to 74 years who currently or formerly (ie, quit within the past 15 years) smoked cigarettes for 30 pack-years or more were enrolled in the NLST at 23 US centers between August 2002 and April 2004. Information on lung cancer diagnoses was collected through December 31, 2009. Analyses were performed between August 20, 2019, and February 14, 2020.
EXPOSURES: Abnormality scores produced by the AI algorithm.
MAIN OUTCOMES AND MEASURES: The performance of an AI algorithm for the detection of lung nodules and lung cancer on radiographs, with lung cancer incidence and mortality as primary end points.
RESULTS: A total of 5485 participants (mean [SD] age, 61.7 [5.0] years; 3030 men [55.2%]) were included, with a median follow-up duration of 6.5 years (interquartile range, 6.1-6.9 years). For the nodule data set, the sensitivity and specificity of the AI algorithm for the detection of pulmonary nodules were 86.2% (95% CI, 77.8%-94.6%) and 85.0% (95% CI, 81.9%-88.1%), respectively. For the detection of all cancers, the sensitivity was 75.0% (95% CI, 62.8%-87.2%), the specificity was 83.3% (95% CI, 82.3%-84.3%), the positive predictive value was 3.8% (95% CI, 2.6%-5.0%), and the negative predictive value was 99.8% (95% CI, 99.6%-99.9%). For the detection of malignant pulmonary nodules in all images of the full T0 data set, the sensitivity was 94.1% (95% CI, 86.2%-100.0%), the specificity was 83.3% (95% CI, 82.3%-84.3%), the positive predictive value was 3.4% (95% CI, 2.2%-4.5%), and the negative predictive value was 100.0% (95% CI, 99.9%-100.0%). In digital radiographs of the nodule data set, the AI algorithm had higher sensitivity (96.0% [95% CI, 88.3%-100.0%] vs 88.0% [95% CI, 75.3%-100.0%]; P = .32) and higher specificity (93.2% [95% CI, 89.9%-96.5%] vs 82.8% [95% CI, 77.8%-87.8%]; P = .001) for nodule detection compared with the NLST radiologists. For malignant pulmonary nodule detection on digital radiographs of the full T0 data set, the sensitivity of the AI algorithm was higher (100.0% [95% CI, 100.0%-100.0%] vs 94.1% [95% CI, 82.9%-100.0%]; P = .32) compared with the NLST radiologists, and the specificity (90.9% [95% CI, 89.6%-92.1%] vs 91.0% [95% CI, 89.7%-92.2%]; P = .91), positive predictive value (8.2% [95% CI, 4.4%-11.9%] vs 7.8% [95% CI, 4.1%-11.5%]; P = .65), and negative predictive value (100.0% [95% CI, 100.0%-100.0%] vs 99.9% [95% CI, 99.8%-100.0%]; P = .32) were similar to those of NLST radiologists.
CONCLUSIONS AND RELEVANCE: In this study, the AI algorithm performed better than NLST radiologists for the detection of pulmonary nodules on digital radiographs. When used as a second reader, the AI algorithm may help to detect lung cancer.
- NLST-474: Data-driven Imaging Biomarker (DIB) study in NLST datasets (Ki Hwan Kim - 2019)