AI-based improvement in lung cancer detection on chest radiographs: results of a multi-reader study in NLST dataset.
- Lunit, Seoul, Korea.
- Division of Thoracic Imaging, Department of Radiology, Massachusetts General Hospital, 75 Blossom Court, Boston, MA, 02114, USA.
- Department of Radiology, Seoul National University College of Medicine, Seoul, Korea.
- Suwon Total Healthcare Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Youngin-si, Gyeongi-do, 16954, Korea.
- Cheju Halla General Hospital, 65 Doryeong-ro, Yeon-dong, Jeju-si, Jeju-do, Korea.
- Division of Thoracic Imaging, Department of Radiology, Massachusetts General Hospital, 75 Blossom Court, Boston, MA, 02114, USA. mkalra@mgh.harvard.edu.
OBJECTIVE: Assess if deep learning-based artificial intelligence (AI) algorithm improves reader performance for lung cancer detection on chest X-rays (CXRs).
METHODS: This reader study included 173 images from cancer-positive patients (n = 98) and 346 images from cancer-negative patients (n = 196) selected from National Lung Screening Trial (NLST). Eight readers, including three radiology residents, and five board-certified radiologists, participated in the observer performance test. AI algorithm provided image-level probability of pulmonary nodule or mass on CXRs and a heatmap of detected lesions. Reader performance was compared with AUC, sensitivity, specificity, false-positives per image (FPPI), and rates of chest CT recommendations.
RESULTS: With AI, the average sensitivity of readers for the detection of visible lung cancer increased for residents, but was similar for radiologists compared to that without AI (0.61 [95% CI, 0.55-0.67] vs. 0.72 [95% CI, 0.66-0.77], p = 0.016 for residents, and 0.76 [95% CI, 0.72-0.81] vs. 0.76 [95% CI, 0.72-0.81, p = 1.00 for radiologists), while false-positive findings per image (FPPI) was similar for residents, but decreased for radiologists (0.15 [95% CI, 0.11-0.18] vs. 0.12 [95% CI, 0.09-0.16], p = 0.13 for residents, and 0.24 [95% CI, 0.20-0.29] vs. 0.17 [95% CI, 0.13-0.20], p < 0.001 for radiologists). With AI, the average rate of chest CT recommendation in patients positive for visible cancer increased for residents, but was similar for radiologists (54.7% [95% CI, 48.2-61.2%] vs. 70.2% [95% CI, 64.2-76.2%], p < 0.001 for residents and 72.5% [95% CI, 68.0-77.1%] vs. 73.9% [95% CI, 69.4-78.3%], p = 0.68 for radiologists), while that in cancer-negative patients was similar for residents, but decreased for radiologists (11.2% [95% CI, 9.6-13.1%] vs. 9.8% [95% CI, 8.0-11.6%], p = 0.32 for residents and 16.4% [95% CI, 14.7-18.2%] vs. 11.7% [95% CI, 10.2-13.3%], p < 0.001 for radiologists).
CONCLUSIONS: AI algorithm can enhance the performance of readers for the detection of lung cancers on chest radiographs when used as second reader.
KEY POINTS: • Reader study in the NLST dataset shows that AI algorithm had sensitivity benefit for residents and specificity benefit for radiologists for the detection of visible lung cancer. • With AI, radiology residents were able to recommend more chest CT examinations (54.7% vs 70.2%, p < 0.001) for patients with visible lung cancer. • With AI, radiologists recommended significantly less proportion of unnecessary chest CT examinations (16.4% vs. 11.7%, p < 0.001) in cancer-negative patients.
- NLST-474: Data-driven Imaging Biomarker (DIB) study in NLST datasets (Ki Hwan Kim - 2019)