Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Jia Wu
Degrees
Ph.D.
Institution
University of Texas MD Anderson Cancer Center
Position Title
Associate Professor
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCOI-1857
Initial CDAS Request Approval
Mar 12, 2025
Title
Foundation Models for Multi-Cancer Screening and Prognostics using PLCO Data
Summary
Cancer remains a leading cause of morbidity and mortality, with early detection playing a crucial role in improving survival outcomes. The Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial dataset provides a comprehensive opportunity to apply foundation models in medical imaging and clinical data analysis to enhance early cancer screening, diagnosis, and prognostication.

This project aims to develop and refine foundation models, including large-scale deep learning architectures and self-supervised learning frameworks, to analyze multimodal data from the PLCO dataset. These include chest X-ray and extensive clinical metadata. By integrating multi-modal data sources, our research will advance cancer detection, risk stratification, and personalized screening strategies for lung cancer.

The study will apply transformer-based architectures (e.g., Vision Transformers, Multimodal Large Language Models) to automate radiological interpretation, predict disease progression, and enhance risk assessment. Additionally, federated learning techniques will be explored to improve model robustness and ensure data privacy.

Expected Outcomes:

Enhanced predictive models for multi-cancer screening using multimodal learning.

Foundation model fine-tuning for early-stage cancer identification with minimal clinician intervention.

Explainable AI approaches to improve model interpretability and facilitate clinical adoption.

Development of an open-access AI framework to support multi-cancer research using PLCO data.

By leveraging the PLCO dataset and state-of-the-art AI methodologies, this research will contribute to AI-driven precision oncology and improve cancer screening outcomes at a population level.
Aims

Project Aims

Aim 1: Multimodal Data Integration for AI-Driven Cancer Research

Consolidate PLCO x-ray chest imaging and clinical metadata into a structured research resource.

Harmonize data with external datasets (e.g., other cancer screening cohorts, electronic health records) to enhance model generalizability.

Develop standardized pre-processing pipelines for multimodal medical data, facilitating AI model training across different data sources.

Aim 2: Development of Foundation Models for Multi-Cancer Screening

Train and fine-tune self-supervised learning (SSL) and transformer-based architectures (e.g., ViTs, multimodal transformers) for cancer detection.

Implement multimodal fusion models that integrate imaging, clinical metadata, and temporal data to improve personalized cancer risk prediction.

Utilize contrastive learning and latent space alignment techniques to enhance representation learning across heterogeneous data modalities.

Aim 3: Application of Foundation Models to Downstream Clinical Tasks

Apply trained models to cancer detection, malignancy classification, and longitudinal risk assessment across PLCO cancer types.

Develop interpretable AI frameworks using attention mechanisms and saliency maps to support clinical decision-making.

Validate model performance on external datasets and assess fairness, bias, and real-world applicability in diverse patient populations.

Collaborators

Jia Wu, Ph.D., MD Anderson, jwu11@mdanderson.org
Kai Zhang, Ph.D., MD Anderson, kzhang7@mdanderson.org
Morteza Salehjahromi, Ph.D., MD Anderson, msalehjahromi@mdanderson.org
Ivan Coronado, Ph.D., MD Anderson, icoronado1@mdanderson.org