Predicting Mutational Landscapes from H&E Slides in Population-Based Cohorts
By leveraging multi-omics data (e.g., whole-genome sequencing) from The Cancer Genome Atlas (TCGA) and advanced AI methodologies, we have developed a unified framework combining contrastive learning and knowledge distillation to uncover associations between tumor genomic alterations and specific morphological patterns in histopathology images. In this project, we aim to apply our model to the PLCO dataset to predict spatial genomic features—including driver mutation status, mutational signatures, ecDNA, and other alterations—across multiple cancer types.
Utilizing the enriched genotype and phenotype data available in the PLCO resource, we will also conduct association studies: 1) Genotype–genomics features mapping to investigate somatic–germline interactions; and 2) Exposure–genomic features associations to uncover potential environmental or lifestyle-related etiologies driving the observed genomic events. This study has the potential to reveal novel insights into the cells of origin, etiological factors, and molecular mechanisms underlying tumor development across diverse cancer types.
The objective of this project is to apply and extend our deep learning framework to predict spatially resolved tumor genomic features from hematoxylin and eosin (H&E) stained histopathology slides within the PLCO cohort. We aim to integrate image-based predictions with rich genotype and phenotype data to uncover novel associations relevant to cancer biology and etiology. Specifically, we propose the following aims:
Aim 1: Apply our AI framework to predict genomic features—including driver mutations, mutational signatures, and extrachromosomal DNA (ecDNA)—from H&E slides across multiple cancer types in the PLCO dataset. This will demonstrate the generalizability of our model in a large, population-based cohort.
Aim 2: Investigate associations between germline variants and AI-predicted somatic genomic features to identify potential somatic–germline interactions that contribute to tumor development.
Aim 3: Analyze the relationship between environmental and lifestyle exposures and predicted genomic features to explore potential etiologic factors underlying observed genomic features (e.g., mutational processes).
Together, these aims will advance our understanding of the molecular and environmental determinants of cancer, providing insights into disease mechanisms and potential prevention strategies.
Guangyu Wang, Houston Methodist Research Institute (HMRI), PLCOI-1952
Sang Jian, Houston Methodist Research Institute (HMRI)