Skip to Main Content
An official website of the United States government

Enhancing Pathology Foundation Models for Cancer Prognosis through Integration of PLCO H&E Stained Pathology Images

Principal Investigator

Name
Siyi Tang

Degrees
Ph.D.

Institution
Artera Inc.

Position Title
Senior Machine Learning Engineer

Email
siyi@artera.ai

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCOI-2002

Initial CDAS Request Approval
Dec 10, 2025

Title
Enhancing Pathology Foundation Models for Cancer Prognosis through Integration of PLCO H&E Stained Pathology Images

Summary
Foundation models for histopathology are large-scale deep learning models pre-trained on extensive collections of whole-slide images (WSIs) that learn generalizable representations of histopathological patterns. These models are typically trained using self-supervised learning techniques on unlabeled or weakly labeled data, enabling them to capture fundamental morphological features, such as tissue architecture, cellular morphology, and spatial relationships, that are common across different cancer types and tissue types. Once trained, foundation models serve as powerful feature extractors that can be fine-tuned or used as encoders for downstream tasks, including cancer diagnosis, prognosis prediction, biomarker discovery, and treatment response assessment. By learning from diverse, large-scale datasets, foundation models overcome the limitations of task-specific models trained on smaller datasets, thereby enabling more robust and generalizable AI systems for digital pathology.

At Artera Inc., we are dedicated to advancing personalized cancer treatment by developing robust biomarkers for cancer prognosis. A critical component of this endeavor is the creation of pathology foundation models trained on diverse and extensive datasets. To achieve this goal, we propose to incorporate hematoxylin and eosin (H&E) stained pathology images from the PLCO Cancer Screening Trials.

Specifically, we seek access to all available H&E stained slides, encompassing the following cancer types:
1. Adenoma
2. Bladder
3. Breast
4. Male Breast
5. Colorectal
6. Lung
7. Ovarian
8. Prostate

If available, we would also appreciate any information on scanner type and magnification used for scanning H&E images. We would also appreciate metadata that allows us to identify whether multiple slides originate from the same patient.

Aims

Expand Training Data: Increase the volume and diversity of our training dataset for more generalized and robust pathology foundation models.

Collaborators

Siyi Tang Artera Inc.
Nitin Kumar Mittal Artera Inc.
Ali Moatadelro Artera Inc.
Erik Rosten Artera Inc.
Nathan Silberman Artera Inc.
Reda Oulbacha Artera Inc.
Rikiya Yamashita Artera Inc.
Tunai Porto Marques Artera Inc.
Wouter Zwerink Artera Inc.