Pan-cancer digital pathology AI foundation model and its application in cancer prognostication

Principal Investigator

Name
Jan Witowski

Degrees
M.D., Ph.D.

Institution
Ataraxis AI, Inc.

Position Title
Chief Executive Officer

Email
jan.witowski@ataraxis.ai

About this CDAS Project

Study
PLCO (Learn more about this study)

Project ID
PLCOI-1433

Initial CDAS Request Approval
Jan 9, 2024

Title
Pan-cancer digital pathology AI foundation model and its application in cancer prognostication

Summary
AI foundation models are deep neural networks that excel in learning effective data representations without labels, using only self-supervised learning. These advanced AI models are trained on massive and diverse datasets from multiple sources. Traditionally, foundation models have been predominantly used, with significant success, in the field of natural language processing. However, more recently, their potential for vast impact in other application areas, including medical imaging, is becoming more apparent. In digital pathology, these models can extract features from whole slide images, which can then be used as novel biomarkers, offering new insights into patient outcomes and helping to stratify cancer cohorts into new subgroups, enhancing personalization of treatment. For example, in breast cancer, traditional prognostic and predictive tests are based primarily on genomic data, missing valuable information in digitized specimen images.

We have already developed an initial version of the pan-cancer foundation model together with a downstream prognostic/predictive model, and validated it on several independent patient cohorts. Our initial results indicate that increasing both the size of the data set used to train the foundation model lead to a substantial improvement in performance across a wide range of tasks, both in terms of accuracy and robustness of the final model.

In this project, we are looking to evaluate the generalizability of the foundation model by measuring the accuracy of downstream tasks predicting patient outcomes in multiple cancers within the PLCO trial. We also aim to understand how increasing further the size of the data set used in training the foundation model affects its accuracy and how generalizable across different cancers are the features that it learns, thereby assessing its potential as a universal tool in cancer prognostication.

Aims

Aim 1. We will use the pan-cancer foundation model to extract features from PLCO pathology images and subsequently use these features to predict patient outcomes across multiple tumor types.

Aim 2. We will evaluate patterns of generated features and predictions between patient subgroups, including between various clinical indications.

Aim 3. We will analyze the foundation model’s performance for different sizes of the training set and its generalizability across different cancers.

Collaborators

No external collaborators.