Foundation Models for Lung Cancer Screening and Prognostics-

Principal Investigator

Name
Jia Wu

Degrees
Ph.D

Institution
University of Texas MD Anderson Cancer Center

Position Title
Associate Professor

Email
jwu11@mdanderson.org

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-1451

Initial CDAS Request Approval
Aug 25, 2025

Title
Foundation Models for Lung Cancer Screening and Prognostics-

Summary
Lung cancer remains a leading cause of cancer-related mortality worldwide, with early detection being critical for improving survival rates. The National Lung Screening Trial (NLST) dataset provides a unique opportunity to leverage foundation models in medical imaging and clinical data analysis to enhance lung cancer screening, diagnosis, and prognostication.

This project aims to develop and fine-tune foundation models, including large-scale deep learning architectures and self-supervised learning frameworks, to analyze low-dose computed tomography (CT) scans, chest radiographs, and clinical metadata from NLST. By integrating multi-modal data, our research will advance early lung cancer detection, risk stratification, and personalized screening strategies.

The proposed study will apply transformer-based architectures (e.g., Vision Transformers, Multimodal Large Language Models) to automate radiological interpretation, predict disease progression, and enhance risk assessment. Additionally, we will explore federated learning approaches to improve model generalizability while ensuring data privacy.

The expected outcomes include:

• Enhanced predictive models for lung cancer detection using multimodal learning.

• Foundation model fine-tuning for early-stage lung cancer identification with minimal radiologist input.

• Explainable AI approaches to improve model interpretability for clinical adoption.

• Development of a publicly accessible AI framework to support lung cancer research.

By leveraging NLST's rich dataset and cutting-edge AI techniques, this study will contribute to the broader field of AI-driven precision medicine in oncology.

Aims

Aim 1: Multimodal Data Consolidation for AI-Driven Lung Cancer Research

• Integrate NLST imaging (chest radiographs) and clinical metadata into a structured research resource.

• Harmonize data with external datasets (e.g., other cancer screening studies, hospital datasets) to improve generalizability.

• Develop standardized pre-processing pipelines for multimodal medical data, enabling AI model training across different data sources.

Aim 2: Design and Implementation of Foundation Models for Medical Imaging & Clinical Data

• Develop and fine-tune self-supervised learning (SSL) and transformer-based architectures (e.g., ViTs, multimodal transformers) for lung cancer screening.

• Implement multimodal fusion models that integrate imaging, clinical metadata, and temporal data to improve cancer risk prediction.

• Leverage contrastive learning and latent space alignment to enhance representation learning across different data modalities.

Aim 3: Application of Foundation Models to Downstream Clinical Tasks

• Apply trained models to lung cancer detection, malignancy classification, and longitudinal risk assessment.

• Develop interpretable AI frameworks using attention mechanisms and saliency maps to support clinician decision-making.

• Validate model performance on external datasets and assess generalizability, bias, and fairness in real-world clinical scenarios.

Collaborators

Jia Wu, Ph.D., MD Anderson, jwu11@mdanderson.org

Kai Zhang, Ph.D., MD Anderson, kzhang7@mdanderson.org

Morteza Salehjahromi, Ph.D., MD Anderson, msalehjahromi@mdanderson.org

Ivan Coronado, Ph.D., MD Anderson, icoronado1@mdanderson.org