Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Tongwu Zhang
Degrees
Ph.D.
Institution
National Cancer Institute
Position Title
Earl Stadtman Investigator
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCOI-1942
Initial CDAS Request Approval
Jul 11, 2025
Title
Predicting Mutational Landscapes from H&E Slides in Population-Based Cohorts
Summary
Recent advances in computational pathology and artificial intelligence (AI) have enabled the extraction of rich biological information directly from hematoxylin and eosin (H&E) stained histopathology slides. Predicting genomic features—such as driver mutations, mutational signatures, and extrachromosomal DNA (ecDNA)—from pathology images offers a powerful, non-invasive approach to understanding tumor biology and heterogeneity. These genomic features are central to cancer initiation, progression, therapy resistance, and prognosis. Spatially resolved inference of such features from H&E slides can provide insights into tumor evolution, microenvironmental interactions, and the cells of origin.
By leveraging multi-omics data (e.g., whole-genome sequencing) from The Cancer Genome Atlas (TCGA) and advanced AI methodologies, we have developed a unified framework combining contrastive learning and knowledge distillation to uncover associations between tumor genomic alterations and specific morphological patterns in histopathology images. In this project, we aim to apply our model to the PLCO dataset to predict spatial genomic features—including driver mutation status, mutational signatures, ecDNA, and other alterations—across multiple cancer types.
Utilizing the enriched genotype and phenotype data available in the PLCO resource, we will also conduct association studies: 1) Genotype–genomics features mapping to investigate somatic–germline interactions; and 2) Exposure–genomic features associations to uncover potential environmental or lifestyle-related etiologies driving the observed genomic events. This study has the potential to reveal novel insights into the cells of origin, etiological factors, and molecular mechanisms underlying tumor development across diverse cancer types.
Aims

The objective of this project is to apply and extend our deep learning framework to predict spatially resolved tumor genomic features from hematoxylin and eosin (H&E) stained histopathology slides within the PLCO cohort. We aim to integrate image-based predictions with rich genotype and phenotype data to uncover novel associations relevant to cancer biology and etiology. Specifically, we propose the following aims:

Aim 1: Apply our AI framework to predict genomic features—including driver mutations, mutational signatures, and extrachromosomal DNA (ecDNA)—from H&E slides across multiple cancer types in the PLCO dataset. This will demonstrate the generalizability of our model in a large, population-based cohort.
Aim 2: Investigate associations between germline variants and AI-predicted somatic genomic features to identify potential somatic–germline interactions that contribute to tumor development.
Aim 3: Analyze the relationship between environmental and lifestyle exposures and predicted genomic features to explore potential etiologic factors underlying observed genomic features (e.g., mutational processes).

Together, these aims will advance our understanding of the molecular and environmental determinants of cancer, providing insights into disease mechanisms and potential prevention strategies.

Collaborators

Guangyu Wang, Houston Methodist Research Institute (HMRI), PLCOI-1952
Sang Jian, Houston Methodist Research Institute (HMRI)