Evaluating the Impact of Radiologist Experience and AI Integration on Lung Cancer Screening Performance: A Comparative Analysis of a Leading Dutch Radiology Center Using NLST Data
The study will utilize the National Lung Screening Trial (NLST) dataset to simulate the clinical environment of Erasmus MC’s lung cancer screening program. Radiologists’ performance metrics, including sensitivity, specificity, false positive rates, and false negative rates, will be assessed. Simultaneously, the AI algorithm will analyze the same LDCT images, focusing on two tasks:
Identifying healthy lungs to reduce the number of unnecessary follow-ups (rule-out function).
Detecting abnormalities and predicting long-term cancer risk (diagnostic and predictive function).
By comparing the performance of radiologists and AI, the project will identify areas where automation could complement or enhance human decision-making. Furthermore, cases where radiologist and AI interpretations diverge will be studied in depth to uncover patterns, such as cancers missed by radiologists but flagged by AI. The ultimate goal is to develop a hybrid model that leverages the strengths of both radiologists and AI to improve patient outcomes, reduce diagnostic variability, and optimize lung cancer screening workflows.
Aim 1: Assess radiologists’ diagnostic performance against an automated detection tool:using the NLST dataset, stratified by radiologists' experience level .
-Group 1: Radiologists with up to 2 years of experience.
- Group 2: Radiologists with 2–5 years of experience.
- Group 3: Radiologists with over 5 years of experience.
Measure outcome metrics, including:
- Sensitivity, Specificity.
- False-positive rate (FPR): Proportion of healthy cases incorrectly identified as abnormal.
- False-negative rate (FNR): Proportion of cancer cases incorrectly identified as healthy.
- Investigate inter-group differences in performance across various difficulty levels, such as early-stage cancers and subtle abnormalities.
Aim 2: Validate an AI algorithm for automated screening and diagnostic tasks.
- Deploy an advanced AI model to analyze NLST LDCT images.
- Evaluate the algorithm’s performance using the following metrics:
- Accuracy of rule-out function: The proportion of healthy lungs correctly identified.
- Detection sensitivity: Ability to detect lung cancer across different stages.
- Long-term predictive accuracy: Concordance index for lung cancer risk prediction over 1 and 6 years.
- Workflow efficiency: The percentage reduction in the volume of images requiring radiologist review when the algorithm is used for pre-screening.
Aim 3: Compare the performance of radiologists and AI on the NLST dataset.
Conduct a head-to-head comparison of radiologists’ and AI’s performance, focusing on:
- Cases of agreement (both identify correctly) and disagreement (missed cancers by radiologists, false positives by AI).
- Performance on subtle imaging abnormalities and challenging cases (e.g., early-stage lung cancers).
- Investigate whether the AI algorithm can detect patterns overlooked by radiologists and analyze their potential impact on patient outcomes.
Aim 4: Explore the integration of AI into clinical workflows.
- Simulate hybrid workflows combining AI and radiologists to optimize screening and diagnosis.
Assess key outcomes, such as:
- Reduction in radiologist workload (% of cases ruled out by AI).
- Improvement in diagnostic sensitivity and specificity in the hybrid workflow compared to radiologists alone.
- Time savings per case and overall cost implications of incorporating AI.
Aim 5: Assess the influence of radiologist experience on human-AI synergy.
Investigate how radiologists with varying levels of experience (Groups 1, 2, and 3) interact with AI in a hybrid workflow.
Evaluate metrics such as:
- Agreement rates between AI and radiologists within each experience group.
- Cases where AI suggestions led to improved accuracy or avoided errors.
- Confidence levels in decision-making and the reliance on AI outputs among the three groups.
- Identify whether less-experienced radiologists derive greater benefits from AI assistance compared to their more experienced counterparts.
Outcome Measures for All Aims:
- Diagnostic Metrics: Sensitivity, specificity, false-positive rate, false-negative rate, and accuracy.
- Workflow Efficiency: Percentage of cases requiring manual review and time saved per case.
- Long-term Predictive Power: Concordance indices for cancer risk prediction over 1 and 6 years.
- Error Analysis: Characteristics of missed cancers and false positives by both radiologists and AI.
- Impact of Experience: Performance stratification by radiologist experience and interaction patterns with AI.
Dr. Alex Puiu - Senior Researcher
Marton Roux, BSc - Computer Scientist
Shidi Xia, MSc - Computer Scientist