Skip to Main Content

An official website of the United States government

Principal Investigator
Name
HIMANSHU ARORA
Degrees
PhD
Institution
UNIVERSITY OF MIAMI
Position Title
Assistant Professor
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCOI-1641
Initial CDAS Request Approval
Aug 15, 2024
Title
Leveraging Generative Adversarial Networks for Enhanced Cancer Diagnosis and Prognostic Assessment: A Paradigm Shift in Clinical Decision-Making
Summary
Early detection and accurate prognosis are significant challenges for the treatment of cancer (PCa). The lack of these can increase death and overall disease risks, as well as the cost of treatment. Recent progress in machine learning has shown potential for developing pipelines that automate standardized and objective assessments, thus reducing time, human capital, and other resources [6-8]. Companies like PathAI, HTL Clinical, PaigeAI, and Deciplex are using Artificial Intelligence (AI) tools to identify cancer areas in digital pathology and distinguish them from non-cancerous regions [9-16]. However, these tools have certain limitations that restrict their clinical use. For instance, they rely only on histological architecture to grade cancers, have minimal capability to distinguish between proximal cancer areas, and cannot predict cancer progression/regression [10, 11]. One of the primary reasons for these limitations is that AI models rely on large amounts of clinical data for training, which is often biased, lacks diversity, and is not publicly available. Overcoming these limitations is essential to improve the clinical utility and efficacy of AI platforms that automate cancer diagnosis and prognosis, ultimately improving patient outcomes. To address this issue, this study proposes the integration of advanced generative adversarial network (GAN) models, in-house quantification models, and multi-Omic tumor profiling data to create high-quality synthetic images of different cancer grades, accounting for the granularity of each grade. These images will partially or completely substitute the real histology data for training the convolutional neural network (CNN) models and evaluate the improvement in their grading efficiency.

In our preliminary research, we focused on using prostate cancer to build and refine our models. We processed multi-omics data from patients with primary adenocarcinoma (PRAD) of the prostate taken from the cancer genome atlas [19]. We then evaluated various GAN models to generate synthetic data of PCa histology without any enhancement and settled on the most efficient one, dcGAN. Additionally, we created spatial heterogeneous recurrence quantification analysis (SHQRA) modules to quantify the images. We put the selected GAN model to the first phase of testing. We used the synthetic images produced by GAN to train a CNN model, EfficientNet. We compared its grading performance on PCa images with a CNN model trained solely on image patches from real images. The CNN model trained with synthetic images significantly increased its grading efficiency compared to the model trained with real image patches alone. As a result, we hypothesize that enhanced GAN models, which utilize customized genomics and novel SHRQA quantification, can present the CNN models with the training data, which is capable of significantly enhancing their capabilities to perform diagnostics.
Aims

Aim 1: Integrate Multi-Omic Tumor Profiles to Refine Synthetic Histology Image Generation for AI Training: Through deep mining of comprehensive tumor genomic profiles, we have established precise gene expression patterns that are normalized for each tumor grade and are not confounded by clinical variables such as age, PSA, or therapy differences within each grade. In this aim, we will pioneer the integration of tumor genomic data directly into synthetic image-generating GAN to develop a new 'bio-intelligent’ GAN model. This GAN model will be able to selectively get trained on histology images that pass the molecular characterization, therefore generating customized histology image sets based on tumor genomes, empowering AI model training with unparalleled depth and complexity.

Aim 2: Develop and Validate Enhanced AI Diagnostics for Cancer Using Synthetic Image Quantification: We have developed a quantification model that can identify and differentiate between features specific to each grade of cancer (so far, we have only tested Prostate cancer ). Here, we aim to utilize the SHQRA models to assist in preselecting the high-quality synthetic images. The synthetic images will then be used to enhance the efficiency of the CNN model to perform grading and will be compared with the CNN trained only with original image patches, which were not subjected to spatial quantification. We will perform rigorous cross-validation on unseen images from diverse sources such as the TCGA, UMiami, Radboud, and Karolinska cohorts to ensure broad relevance and clinical robustness. The outcome of this aim will establish the AI model's sensitivity and specificity in complex pathology scenarios.

Aim 3: Map disease trajectory using advanced GAN tools. Here, we will leverage the power of high-quality synthetic image generation to predict the progression or regression of cancer. To achieve this, first, data from the 1st- and 2nd-year follow-up from MAST trial patients (an active surveillance trial for Prostate cancer) will be categorized into progressor and non-progressor groups. The data from these groups will be used to train the bio-intelligent GAN to generate synthetic images. These synthetic images, post filtering with SHQRA quantification, will then be used to train the CNN model. The CNN model will be employed to predict the cancer progression/regression on the 3rd year follow-up data (needle biopsies). The CNN model will also be queried to predict the 4th- and 5th-year follow-up biopsy images sequentially. These results will establish the extended capabilities of the AI model to predict the progression/regression while assigning Gleason scores. Similar to what is proposed for using MAST trial, we will utilize the data from PLCO to build and test applications for other cancer sites.

Impact: The proposed study aims to enhance early cancer detection through improved AI sensitivity to early cancer signs, comprehensive omics profiling, and predictive analytics for timely intervention. Overall, it aims to reduce healthcare costs by streamlining cancer diagnosis.

Collaborators

1. Himanshu Arora, Ph. D, Assistant Professor University of Miami (UM), Principal Investigator (PI).

2. Sanoj Punnen, MD, Associate Professor of Urologic Oncology University of Miami (UM) , Key collaborator. Leading the Active surveillance trial and will provide access to the data for the proposed aims.

3. Oleksandr N Kryvenko, MD, Associate Professor of Pathology and Urology, Miller School of Medicine, UM, Key collaborator. He will coordinate pathology sections digitalization and grading for the proposed study.

4. Derek Van Booven, BS, Head of Data Science and Bioinformatics, Dep of Human Genetics, UM, Data Scientist. He will provide valuable input in editing the AI models and interface development.

5. Sunwoo Han, Ph. D, Assistant Scientist Biostatistical Core, UM, Biostatistician. He will support the study as a biostatistician, designing experiments, developing data collection methods, and ensuring data integrity.

6. Andrew Hung, MD, Associate Professor Keck School of Medicine of USC, Collaborator. From USC, an NIH funded investigator with expertese in machine learning research and conducting Investigator initiated clinical trials. He will provide extensive feedback and support in planning future clinical trials based on the outcomes of the proposed study.

7. Mingzhe Chen, Ph. D, Assistant Professor University of Miami (UM), Key collaborator. Will contribute his expertise in optimization of the proposed 'biointelligent' GAN models, mathematical modeling, and machine learning.

8. Cheng Ben Chen, PhD, Assistant Professor University of Miami (UM), Key collaborator. Will contribute his expertise in complex system characterization, mathematical modeling, and machine learning.