Scalable and Reproducible Deep Learning Strategies for Lung Cancer Screening Imaging

Principal Investigator

Name
Stefano Diciotti

Degrees
Ph.D.

Institution
Alma Mater Studiorum - Department DEI

Position Title
Associate Professor

Email
stefano.diciotti@unibo.it

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1514

Initial CDAS Request Approval
Jun 1, 2026

Title
Scalable and Reproducible Deep Learning Strategies for Lung Cancer Screening Imaging

Summary
Lung cancer (LC) remains the leading cause of cancer-related mortality worldwide. Low-dose CT (LDCT) screening in high-risk populations has been shown to significantly reduce LC mortality, yet current risk stratification strategies remain limited. Deep learning (DL) models offer the potential to estimate individual LC risk directly from imaging data, without relying on additional demographic or clinical information. In particular, algorithms capable of capturing imaging patterns beyond visible lung nodules may substantially improve patient risk assessment and lung cancer screening implementation strategies. The large-scale heterogeneity and complexity of LDCT screening datasets, however, require a rigorous evaluation of model robustness, scalability, and reproducibility.

This project focuses on the development and systematic evaluation of advanced training strategies for deep learning models applied to medical imaging, leveraging distributed, multi-node GPU-based computing architectures. We will assess the efficiency, scalability, and reliability of computational workloads distributed across multiple nodes, and examine how different data representations and experimental design choices affect model performance. Particular emphasis is placed on reproducibility analyses, including the impact of varying subject selection for training, validation, and testing cohorts, as well as on the identification of potential sources of bias that may compromise model generalizability.

Overall, this project aims to advance LC screening by providing a comprehensive evaluation of deep learning–based risk prediction models, with a strong focus on performance, robustness, and reproducibility in real-world screening settings.

Aims

1. Develop and apply DL–based methods for imaging-derived LC risk prediction, leveraging low-dose CT screening data to capture subtle imaging patterns beyond visible lung nodules.

2. Evaluate the scalability, efficiency, and performance of DL training pipelines across multi-GPU and distributed multi-node computing environments, systematically analyzing computational throughput and model stability as data volume and architectural complexity increase.

3. Compare the impact of different medical image file formats, including DICOM and NIfTI, on data loading, preprocessing, training efficiency, and predictive performance, in order to identify best practices for large-scale lung cancer screening studies.

4. Conduct reproducibility analyses by systematically and repeatedly varying subject selection for training, validation, and test cohorts, quantifying the variability in model performance attributable to data partitioning strategies and sampling effects.

5. Identify, characterize, and quantify potential sources of bias related to data heterogeneity, experimental design choices, and training strategies, and assess their impact on model robustness, fairness, and generalizability.

Collaborators

Stefano Diciotti Alma Mater Studiorum - Department DEI
Giulia Raffaella De Luca Alma Mater Studiorum - Department DEI
Christian Di Buò Alma Mater Studiorum - Department DEI
Kai Krajsek Forschungszentrum Jülich GmbH