Deep learning-based prediction of colorectal tumor features and clinical characteristics based on histology images
Artificial intelligence (AI) is being used to process digital image information through deep learning approaches, such as computational synthetic neural networks, in order to recognize patterns and categorize these patterns in relation to other known characteristics. Applying this technology to colorectal tumor histology images, using AI to detect molecular features such as MSI/dMMR, gene mutation status, CIMP status, presence of pathogenic bacteria (e.g., Fusobacterium nucleatum), and immune cell types (e.g., eosinophils, neutrophils, lymphocytes, etc.), offers great potential for efficiently assessing known and potentially novel features of colorectal tumors.
Large-scale training of robust classifiers in molecularly annotated study populations is needed in order to train and validate learning systems to predict tumor molecular features using routine images. The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a sizeable collaboration of studies, including PLCO, with a focus on identification and characterization of genetic risk factors and gene-environment interactions for colorectal cancer. Thousands of colorectal H&E slides of formalin-fixed paraffin-embedded (FFPE) and fresh-frozen tissue are either existing within consortium studies or have been generated as part of GECCO-related projects, and several studies have existing digital images. Additionally, consortium collaborators have the requisite experience and expertise to develop and utilize prediction algorithms for characterizing tumor and microenvironment features (5,6). Therefore, we propose to utilize the sizeable datasets and H&E slide image resources within GECCO, including those within the PLCO, to employ an automatic detection of colorectal tumor tissue and subsequently predict molecular features.
These detailed predict tumor molecular features will allow us to better define tumor subtypes. The relationship between tumor subtypes and etiologic factors for CRC, such as germline genetic, lifestyle, and environmental risk factors, and survival has not been comprehensively studied. To improve our understanding of the underlying etiologic pathways that result in different molecular subtypes of CRC, it is critical to comprehensively evaluate the molecular profile of a large number of CRC cases in relation to genetic and environmental risk factors. A large number of cases and controls from multiple studies will provide precision and built in replication. Understanding the associations between risk factors and the molecular pathology of tumors and impact on survival will provide unique insights into carcinogenic mechanisms and offers the promise of improving public health by optimizing the biological basis for CRC prevention and treatment.
Aim 1: Use artificial intelligence approaches to quantify histopathological patterns across H&E-stained histopathology slide images (from whole sections and tumor microarrays) of colorectal tumors and correlate these with matched molecular features from previously measured histopathological, genomic, and transcriptomic data. This includes developing feature predictions to identify, quantify, and spatially map tissue-specific features, as well as cross-validation to assess classification performance.
Aim 2: Correlate computationally learned histopathological features with lifestyle factors (e.g., smoking, alcohol or diet), host genetic risk factors, and survival outcomes. This includes calling computationally processing digital images of H&E-stained slides (whole sections and tumor microarrays) to identify, quantify, and map features of the tumor and microenvironment, and relating these features to host and tumor characteristics, as well as clinical outcomes (e.g., CRC-specific mortality and recurrence).
1. Arnold, M., Sierra, M.S., Laversanne, M., Soerjomataram, I., Jemal, A., and Bray, F. (2017). Global patterns and trends in colorectal cancer incidence and mortality. Gut 66, 683–691.
2. Jass, J.R. (2007). Molecular heterogeneity of colorectal cancer: Implications for cancer control. Surg Oncol 16 Suppl 1, S7-9.
3. Jass, J.R. (2007). Classification of colorectal cancer based on correlation of clinical, morphological and molecular features. Histopathology 50, 113–130.
4. Leman, J.K.H., Munoz-Erazo, L., and Kemp, R.A. (2020). The intestinal tumour microenvironment. Adv. Exp. Med. Biol. 1226, 1–22.
5. Kather, J.N., Pearson, A.T., Halama, N., Jäger, D., Krause, J., Loosen, S.H., Marx, A., Boor, P., Tacke, F., Neumann, U.P., et al. (2019). Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056.
6. Väyrynen, J.P., Lau, M.C., Haruki, K., Väyrynen, S.A., Dias Costa, A., Borowsky, J., Zhao, M., Fujiyoshi, K., Arima, K., Twombly, T.S., et al. (2020). Prognostic significance of immune cell populations identified by machine learning in colorectal cancer using routine hematoxylin & eosin stained sections. Clin. Cancer Res. clincanres.0071.2020.
Ulrike Peters, Fred Hutchinson Cancer Research Center
Tabitha Harrison, Fred Hutchinson Cancer Research Center
Amanda Phipps, University of Washington and Fred Hutchinson Cancer Research Center
Shuji Ogino, Dana-Farber Cancer Institute
Jonathan Nowak, Brigham and Women’s
Rish Pai, Mayo Clinic
Polly Newcomb, Fred Hutchinson Cancer Research Center
Daniel Buchanan, University of Melbourne
Peter Campbell, American Cancer Society
Wen-Yi Huang, National Cancer Institute Division of Cancer Epidemiology & Genetics
Sonja Berndt, National Cancer Institute Division of Cancer Epidemiology & Genetics
Jakob Kather, University Hospital RWTH Aachen