Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Mohammad Soltanieh-ha
Degrees
Ph.D.
Institution
Boston University
Position Title
Clinical Assistant Professor
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-692
Initial CDAS Request Approval
Jul 13, 2020
Title
Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images
Summary
Histopathological images are rich but incompletely explored data type for studying cancer. Manual inspection is time-consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin slides from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995±0.008), as well as subtypes with lower but significant accuracy (AUC 0.87±0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88±0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45±0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification but also for cross-comparisons to reveal conserved spatial biology.

NLST dataset will be used as an independent, external data source to cross-validate the TCGA-based trained models.
Aims

- NLST dataset will be used as an independent, external data source to cross-validate the TCGA-based trained models.
- We will use the H&E images to produce tiles that will be fed into our models for predictions. This will be done after the initial pre-processing steps.

Collaborators

Jeffrey H. Chuang, The Jackson Laboratory for Genomic Medicine
Kourosh Zarringhalam, University of Massachusetts-Boston
Ali Foroughi pour, The Jackson Laboratory for Genomic Medicine
Javad Noorbakhsh, Broad Institute of MIT and Harvard
Lingyi Xu, Boston University
Saman Farahmand, University of Massachusetts-Boston
Alex, University of Massachusetts-Boston
Dennis Caruana, Yale University School of Medicine
David Rimm, Yale University School of Medicine
Sandeep Namburi, The Jackson Laboratory for Genomic Medicine