Exascale Deep Learning for screening based predictive modeling for lung cancer
Principal Investigator
Name
Greeshma Agasthya
Degrees
Ph.D
Institution
Oak Ridge National Laboratory
Position Title
Staff scientist
Email
About this CDAS Project
Study
PLCO
(Learn more about this study)
Project ID
PLCOI-544
Initial CDAS Request Approval
Nov 4, 2019
Title
Exascale Deep Learning for screening based predictive modeling for lung cancer
Summary
Introduction: Lung cancer prediction from 2D imaging has been studied by several groups. The PLCO lung imaging dataset provides both 2D chest x-rays as well as pathology images.
In this project we propose to (1) use scalable, deep learning algorithms with full resolution chest x-ray images, (2) corresponding patient data and (3) biomarkers extracted from pathology images (using developed AI algorithms) to develop cancer detection and prediction models. (4) We will also use statistically matched 3D images from NLST to predict cancer by transfer learning and mapping from 2D (chest x-rays) to 3D (LDCT) imaging. Finally, we will fuse PLCO lung dataset with other large datasets such as LIDC-IDRI, MIMIC-CXR, and BACH histopathology to improve accuracy of cancer prediction and prognosis models.
Resources: The Oak Ridge National Laboratory (ORNL) hosts the Oak Ridge Leadership Computing Facility (OLCF) whose Summit supercomputer will be used as part of this project. Summit is a 4,608 node IBM AC922 system that provides 9,216 IBM Power9 processors and 27,648 Nvidia Volta graphics processing units (GPUs). The Volta GPUs each boast a theoretical peak performance of 7 petaflops (PF), bringing the total peak performance of Summit to 200PF. With total system memory of 10 petabytes (PB) and a Red Hat Enterprise Linux operating system, Summit enables unprecedented capabilities for machine learning and artificial intelligence. The Summit system became the most powerful supercomputer in the world with the #1 ranking on the June 2018 Top500 list.
Preliminary work: We have extensive experience in population-scale medical image analysis as well as high performance computing and deep learning. As part of ORNL’s AI Initiative (https://ai.ornl.gov), we have performed large-scale image registration and classification studies using neuroimaging data from the OASIS project (https://oasis-brains.org). Additionally, as part of an on-going collaboration between the US Department of Energy (DOE) and the National Cancer Institute (NCI) (https://datascience.cancer.gov/collaborations/joint-design-advanced-computing), we have developed novel predictive models and performed large-scale hyperparameter optimization on electronic health records from hundreds of thousands of patients from the NCI SEER project. Our analytical expertise, data- and compute infrastructure are well-tested and ready to apply massive modern deep learning methodologies to promising large-scale datasets like NLST and PLCO lung dataset.
In this project we propose to (1) use scalable, deep learning algorithms with full resolution chest x-ray images, (2) corresponding patient data and (3) biomarkers extracted from pathology images (using developed AI algorithms) to develop cancer detection and prediction models. (4) We will also use statistically matched 3D images from NLST to predict cancer by transfer learning and mapping from 2D (chest x-rays) to 3D (LDCT) imaging. Finally, we will fuse PLCO lung dataset with other large datasets such as LIDC-IDRI, MIMIC-CXR, and BACH histopathology to improve accuracy of cancer prediction and prognosis models.
Resources: The Oak Ridge National Laboratory (ORNL) hosts the Oak Ridge Leadership Computing Facility (OLCF) whose Summit supercomputer will be used as part of this project. Summit is a 4,608 node IBM AC922 system that provides 9,216 IBM Power9 processors and 27,648 Nvidia Volta graphics processing units (GPUs). The Volta GPUs each boast a theoretical peak performance of 7 petaflops (PF), bringing the total peak performance of Summit to 200PF. With total system memory of 10 petabytes (PB) and a Red Hat Enterprise Linux operating system, Summit enables unprecedented capabilities for machine learning and artificial intelligence. The Summit system became the most powerful supercomputer in the world with the #1 ranking on the June 2018 Top500 list.
Preliminary work: We have extensive experience in population-scale medical image analysis as well as high performance computing and deep learning. As part of ORNL’s AI Initiative (https://ai.ornl.gov), we have performed large-scale image registration and classification studies using neuroimaging data from the OASIS project (https://oasis-brains.org). Additionally, as part of an on-going collaboration between the US Department of Energy (DOE) and the National Cancer Institute (NCI) (https://datascience.cancer.gov/collaborations/joint-design-advanced-computing), we have developed novel predictive models and performed large-scale hyperparameter optimization on electronic health records from hundreds of thousands of patients from the NCI SEER project. Our analytical expertise, data- and compute infrastructure are well-tested and ready to apply massive modern deep learning methodologies to promising large-scale datasets like NLST and PLCO lung dataset.
Aims
Aim 1: Predict lung nodules on full resolution 2D images only (using PLCO lung dataset)
Aim 2: Predict lung nodules on full resolution 3D images (from NLST) through transfer learning from 2D images (PLCO)
Aim 3: Develop AI algorithms to extract biomarkers from pathology images and fuse data from pathology images, 2D and 3D images to predict cancer and cancer mortality.
Collaborators
Jacob Hinkle, Oak Ridge National Laboratory
Georgia Tourassi, Oak Ridge National Laboratory
Joe Lake, Oak Ridge National Laboratory