PLCOI-1857: Foundation Models for Multi-Cancer Screening and Prognostics using PLCO Data - Approved Projects

Studies on CDAS

Additional Studies...

More Information

Principal Investigator

Name

Jia Wu

Degrees

Ph.D.

Institution

University of Texas MD Anderson Cancer Center

Position Title

Associate Professor

jwu11@mdanderson.org

About this CDAS Project

Study

PLCO (Learn more about this study)

Project ID

PLCOI-1857

Initial CDAS Request Approval

Mar 12, 2025

Title

Foundation Models for Multi-Cancer Screening and Prognostics using PLCO Data

Summary

Cancer remains a leading cause of morbidity and mortality, with early detection playing a crucial role in improving survival outcomes. The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial dataset provides a comprehensive opportunity to apply foundation models in medical imaging and clinical data analysis to enhance early cancer screening, diagnosis, and prognostication.

This project aims to develop and refine foundation models, including large-scale deep learning architectures and self-supervised learning frameworks, to analyze multimodal data from the PLCO dataset. These include chest X-ray and extensive clinical metadata. By integrating multi-modal data sources, our research will advance cancer detection, risk stratification, and personalized screening strategies for lung cancer.

The study will apply transformer-based architectures (e.g., Vision Transformers, Multimodal Large Language Models) to automate radiological interpretation, predict disease progression, and enhance risk assessment. Additionally, federated learning techniques will be explored to improve model robustness and ensure data privacy.

Expected Outcomes:

Enhanced predictive models for multi-cancer screening using multimodal learning.

Foundation model fine-tuning for early-stage cancer identification with minimal clinician intervention.

Explainable AI approaches to improve model interpretability and facilitate clinical adoption.

Development of an open-access AI framework to support multi-cancer research using PLCO data.

By leveraging the PLCO dataset and state-of-the-art AI methodologies, this research will contribute to AI-driven precision oncology and improve cancer screening outcomes at a population level.

Aims

Project Aims

Aim 1: Multimodal Data Integration for AI-Driven Cancer Research

Consolidate PLCO x-ray chest imaging and clinical metadata into a structured research resource.

Harmonize data with external datasets (e.g., other cancer screening cohorts, electronic health records) to enhance model generalizability.

Develop standardized pre-processing pipelines for multimodal medical data, facilitating AI model training across different data sources.

Aim 2: Development of Foundation Models for Multi-Cancer Screening

Train and fine-tune self-supervised learning (SSL) and transformer-based architectures (e.g., ViTs, multimodal transformers) for cancer detection.

Implement multimodal fusion models that integrate imaging, clinical metadata, and temporal data to improve personalized cancer risk prediction.

Utilize contrastive learning and latent space alignment techniques to enhance representation learning across heterogeneous data modalities.

Aim 3: Application of Foundation Models to Downstream Clinical Tasks

Apply trained models to cancer detection, malignancy classification, and longitudinal risk assessment across PLCO cancer types.

Develop interpretable AI frameworks using attention mechanisms and saliency maps to support clinical decision-making.

Validate model performance on external datasets and assess fairness, bias, and real-world applicability in diverse patient populations.

Collaborators

Jia Wu, Ph.D., MD Anderson, jwu11@mdanderson.org
Kai Zhang, Ph.D., MD Anderson, kzhang7@mdanderson.org
Morteza Salehjahromi, Ph.D., MD Anderson, msalehjahromi@mdanderson.org
Ivan Coronado, Ph.D., MD Anderson, icoronado1@mdanderson.org