Exascale Deep Learning for Predictive Modeling of Lung Cancer
Principal Investigator
Name
Greeshma Agasthya
Degrees
Ph.D
Institution
Oak Ridge National Laboratory
Position Title
Staff scientist
Email
About this CDAS Project
Study
NLST
(Learn more about this study)
Project ID
NLST-586
Initial CDAS Request Approval
Oct 18, 2019
Title
Exascale Deep Learning for Predictive Modeling of Lung Cancer
Summary
Introduction: Several groups have studied low dose computed tomography (LDCT) as a tool for lung cancer screening in high risk individuals (Example: ever-smokers, people with exposure to asbestos, etc). Some groups have published prediction models to establish people with high risk of lung cancer from baseline measurements as well as imaging for early detection of lung cancer.
In this project we propose to (1) use scalable, deep learning algorithms with full resolution LDCT images, (2) corresponding patient data and (3) biomarkers extracted from pathology images (using developed AI algorithms) to develop cancer detection and prediction models. (4) Fuse this data with other large datasets such as LIDC-IDRI, MIMIC-CXR, and BACH histopathology to improve accuracy of cancer prediction and prognosis models.
Resources: The Oak Ridge National Laboratory (ORNL) hosts the Oak Ridge Leadership Computing Facility (OLCF) whose Summit supercomputer will be used as part of this project. Summit is a 4,608 node IBM AC922 system that provides 9,216 IBM Power9 processors and 27,648 Nvidia Volta graphics processing units (GPUs). The Volta GPUs each boast a theoretical peak performance of 7 petaflops (PF), bringing the total peak performance of Summit to 200PF. With total system memory of 10 petabytes (PB) and a Red Hat Enterprise Linux operating system, Summit enables unprecedented capabilities for machine learning and artificial intelligence. The Summit system became the most powerful supercomputer in the world with the #1 ranking on the June 2018 Top500 list.
Preliminary work: We have extensive experience in population-scale medical image analysis as well as high performance computing and deep learning. As part of ORNL’s AI Initiative (https://ai.ornl.gov), we have performed large-scale image registration and classification studies using neuroimaging data from the OASIS project (https://oasis-brains.org). Additionally, as part of an on-going collaboration between the US Department of Energy (DOE) and the National Cancer Institute (NCI) (https://datascience.cancer.gov/collaborations/joint-design-advanced-computing), we have developed novel predictive models and performed large-scale hyperparameter optimization on electronic health records from hundreds of thousands of patients from the NCI SEER project. Our analytical expertise, data- and compute infrastructure are well-tested and ready to apply massive modern deep learning methodologies to promising large-scale datasets like NLST.
In this project we propose to (1) use scalable, deep learning algorithms with full resolution LDCT images, (2) corresponding patient data and (3) biomarkers extracted from pathology images (using developed AI algorithms) to develop cancer detection and prediction models. (4) Fuse this data with other large datasets such as LIDC-IDRI, MIMIC-CXR, and BACH histopathology to improve accuracy of cancer prediction and prognosis models.
Resources: The Oak Ridge National Laboratory (ORNL) hosts the Oak Ridge Leadership Computing Facility (OLCF) whose Summit supercomputer will be used as part of this project. Summit is a 4,608 node IBM AC922 system that provides 9,216 IBM Power9 processors and 27,648 Nvidia Volta graphics processing units (GPUs). The Volta GPUs each boast a theoretical peak performance of 7 petaflops (PF), bringing the total peak performance of Summit to 200PF. With total system memory of 10 petabytes (PB) and a Red Hat Enterprise Linux operating system, Summit enables unprecedented capabilities for machine learning and artificial intelligence. The Summit system became the most powerful supercomputer in the world with the #1 ranking on the June 2018 Top500 list.
Preliminary work: We have extensive experience in population-scale medical image analysis as well as high performance computing and deep learning. As part of ORNL’s AI Initiative (https://ai.ornl.gov), we have performed large-scale image registration and classification studies using neuroimaging data from the OASIS project (https://oasis-brains.org). Additionally, as part of an on-going collaboration between the US Department of Energy (DOE) and the National Cancer Institute (NCI) (https://datascience.cancer.gov/collaborations/joint-design-advanced-computing), we have developed novel predictive models and performed large-scale hyperparameter optimization on electronic health records from hundreds of thousands of patients from the NCI SEER project. Our analytical expertise, data- and compute infrastructure are well-tested and ready to apply massive modern deep learning methodologies to promising large-scale datasets like NLST.
Aims
Aim 1: Scalable, deep learning algorithms to analyze full resolution LDCT images to predict cancer and cancer mortality.
Aim 2: Include structured patient data with LDCT images to predict cancer and cancer mortality.
Aim 3: Develop AI algorithms to extract biomarkers from pathology images and use data to predict cancer and cancer mortality.
Collaborators
Jacob Hinkle, Oak Ridge National Laboratory
Georgia Tourassi, Oak Ridge National Laboratory