Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Pratik Shah
Massachusetts Institute of Technology
Position Title
Principal Research Scientist
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Nov 13, 2019
Using Region based Convolutional Neural Networks (RCNNs) and its variants to detect, classify, segment and predict lung cancer signatures from NLST data
Lung cancer is one of the leading causes of deaths in the United States. Early diagnosis enables successful treatment in many patients and treatment procedures include immunotherapy, chemotherapy and other systemic anti-cancer therapies. Successful diagnosis however requires expert identification in a timely manner by medical experts to actually help a subject diagnosed with the disease. Histopathological classifications of lung cancer have been paramount in guiding lung cancer treatments. Historically, such classifications have been done by human experts, which is a time consuming process and is prone to human error by default. Also, masking and segmenting cancer signatures from pathology and CT images have been a challenging task. We seek to automate this process by developing a Deep Learning model that aims to take in either a CT, X-ray or pathology image as an input or combinations of these and detects and identifies subtle disease signatures, tumor locations while at the same time provides a classification of such signatures within the input image. The model will also provide precise segmentation masks that will highlight regions of interest (ROIs) within an image type to better explain the model's predictions and outcomes.

The overall goal of the proposed study is to develop a multi-modal Deep Learning model that utilizes pathology, X-ray, CT images and clinical data from NLST to predict and classify tumor regions as well as provide precise masks on target regions of interest within an image modality. We aim to achieve the following outlined goals with the NLST data set:

1. Incorporate and integrate multi-modal data into 1 algorithm to make sense of pathology, X-ray and CT image data. We aim to develop a fully-automated computational approach based on training a Deep Learning classification model to make high-level classifications. This model will be capable of classifying cell types from tumor pathology images, classify CT and X-ray images based on presence or absence of tumors corresponding images The accuracy of this model will be validated visually by trained and experienced pathologists. Further analytical and statistical verification will also be carried out to make our model robust and trustworthy.

2. We will extend our classification model to a RCNN (Region based CNN) based model that will enable segmentation of ROIs (Regions of Interest) from a pathology or CT image. The ground truth for the segmentation masks will be obtained via multiple domain experts who will look at the images meticulously and hand-label the images. We will take into consideration only those image annotations that have high consensus between the labelers. This way we will also have high-quality labels for the NLST image data. These labels will then be used to train a custom model based on the popular Mask-RCNN method that will aim to accurately mask out ROIs while at the same time be able to classify the ROIs it sees on an input image.

3. Seek to develop a clinical prediction model based on the NLST data and provide automated diagnosis and treatment schemes based on stages of cancer development.

4. Develop a robust and well-tested and tried model for predicting cancer and/or tumor developmental stages and prediction of outcomes such as remission, recurrence etc.