Development of Models for Predicting Clinical Outcomes from H&E Stained Whole Slide Images

Principal Investigator

Name
Zoltan Szallasi

Degrees
MD

Institution
Boston Children's Hospital

Position Title
Assistant Professor

Email
zoltan.szallasi@childrens.harvard.edu

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCOI-1720

Initial CDAS Request Approval
Nov 5, 2024

Title
Development of Models for Predicting Clinical Outcomes from H&E Stained Whole Slide Images

Summary
Histopathology remains a cornerstone of cancer diagnosis and treatment, with Hematoxylin and Eosin (H&E) staining being the most widely used method for examining tissue morphology in clinical practice. With the advent of digital pathology, whole slide imaging (WSI) has enabled pathologists and researchers to capture high-resolution digital representations of histological slides. The immense amount of data generated by WSIs offers a promising opportunity to employ advanced computational techniques to extract meaningful information for predicting clinically relevant outcomes. Recent advancements in artificial intelligence (AI) and machine learning, particularly H&E tissue foundation models for vision tasks, have demonstrated significant potential in various medical imaging applications. However, there is a need for robust models tailored specifically for WSIs to predict clinically important endpoints with high accuracy.
The primary goal of this research is to develop and validate AI-based H&E foundation models capable of embedding WSI patches from H&E stained slides to predict downstream clinical endpoints. We aim to address critical challenges in oncology, including the prediction of BRCA mutation status, homologous recombination deficiency (HRD), response to first-line chemotherapy, and response to PARP inhibitor (PARPi) therapy. By leveraging deep learning techniques on large-scale WSI datasets, we aim to create models that can support personalized treatment decisions and enhance patient outcomes.

Aims

Specific aims
1) Investigate Joint Embedding Models to Integrate WSI Representations with Multi-Omics Data
We aim to develop joint embedding models that combine representations from H&E stained WSIs and multi-omics data (e.g., transcriptomics, genomics, proteomics) to enhance the understanding of cancer biology. By integrating multi-omics data during the pre-training phase, we seek to improve the capacity of WSI-based models to capture complex biological features associated with cancer progression and treatment response. Techniques such as contrastive learning or multi-modal autoencoders will be used to create a unified embedding space that reflects both histological and molecular characteristics. We will evaluate these joint embeddings in downstream tasks, such as predicting patient outcomes and identifying biomarkers, to determine whether they outperform models trained with WSI or omics data alone.

2) Evaluate the Prognostic Value of H&E WSIs Compared to Established Genomic Signatures
We will compare the predictive accuracy of WSI-based models with established genomic signatures, including HRDscore, HRDetect, and other FDA-approved assays, in predicting cancer prognosis and treatment response. Our goal is to determine if histological features extracted from WSI patches can serve as surrogates for certain genomic signatures, potentially offering a non-invasive and cost-effective alternative for assessing molecular risk factors. We will conduct a comprehensive analysis of the concordance between WSI-derived features and genomic signatures across multiple cancer types and patient cohorts, exploring the potential for WSI models to enhance or complement existing genomic risk models.

3) Explore Spatial Features Between Tissue Types Using Novel Models Based on Correlation Functions
We plan to develop models that leverage correlation functions between tissue "prototypes" (distinct histological regions) to capture spatial relationships within WSIs. This approach will investigate the prognostic significance of spatial features, such as tissue organization, interaction, and distribution, and their association with clinical outcomes. Correlation-based models will help identify patterns reflecting biological processes like immune response and tumor microenvironment dynamics. We will apply interpretability techniques to understand which spatial features and tissue interactions are most predictive of outcomes, enabling the generation of new biological hypotheses.

4) Enable Automatic Hypothesis Generation Using AI-Based Models
Our AI models will automatically identify potential research questions or hypotheses by detecting novel patterns, biomarkers, or associations between tissue and omics data. Using unsupervised learning and clustering, we will uncover previously unknown patient subgroups linked to distinct clinical outcomes or therapeutic responses. We aim to validate these AI-generated hypotheses through retrospective and prospective studies, ensuring their clinical relevance. This approach facilitates a shift from predictive modeling to discovery-driven research, employing the developed models as tools for continuous learning and exploration in oncology.

These aims address critical challenges in precision medicine, integrating digital pathology and AI to enhance cancer understanding, prognosis accuracy, personalized treatment, and hypothesis generation. Together, they push the boundaries of pathology and computational biology, aiming to transform clinical decision-making.

Collaborators

Istvan Csabai, Dept. of Complex Systems, Eotvos University, Budapes, Hungary