Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Hormuzd Katki
Degrees
PhD
Institution
NCI
Position Title
Senior Investigator
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-537
Initial CDAS Request Approval
Jul 11, 2019
Title
Extracting Image Features with Deep Learning to Improve Cancer Risk Calculation
Summary
We will analyze computed tomography (CT) data from the National Lung Screening Trial (NLST) to evaluate the utility of radiographic image data in lung cancer risk prediction.
In order to quantitatively analyze image features in a more robust manner, we will use as our model a three dimensional convolutional neural network (3D CNN). Our 3D CNN utilizes multiple layers of convolutions to detect and weight features at all scales throughout the lung image. This does have the downside of requiring significantly more computational power, but most of this is required during training and not during deployment.
The chest CT scans used in this study will taken from the all three years of the NLST from both the LSS branch and the ACRIN branch. Each CT scan was labeled by radiologists at the site of the screening for lung CT abnormalities.

The scans will be first converted to NifTI-1 format, then cropped to a bounding box around the lung using the Progressive Holistically-Nested Network (P-HNN) lung segmenter, normalized in three different lung windows of -1000, 200, -160, 240, and -1000, -775, and rescaled to a standard size of 128x128x128. These lung windows were chosen due to their use in the P-HNN segmenter. The resulting 3-channel image will then fed into a standard 3D convolutional neural network. The network will consist of five 4-layer blocks of 3x3x3 convolution, batch normalization, ReLU activation, and 3D max pooling; then a convolution group of 2x2x2 convolution, batch normalization, ReLU activation, and 50% dropout before a fully connected group of 1x1x1 convolution, 50% dropout, 1x1x1 convolution, 50% dropout, a flattening layer, and a dense layer with 2 class outputs. In order to compensate for the asymmetry of the labels (we expect many more non-emphysematous cases than emphysematous cases), positive labels will be weighted 3 times as much as negative labels in the training process.

Three neural networks will be trained for this experiment. One for just T0 scans, another for T0 and T1 scans, and a final model for all T0, T1, and T2 scans. Each neural network will be trained concurrently on 4 NVIDIA Titan X graphics cards using Python 2.7 and Keras bindings for Tensorflow[@keras15]. The majority of the time spent training the model will be spent preprocessing the CT images into the format necessary for our model.
Aims

- Obtain and model chest computed tomography (CT) data from the National Lung Screening Trial (NLST).
- Build an image analysis pipeline that trains neural networks to distill lung CT scans into variables for lung cancer risk prediction models
- Identify image features that may represent lung cancer risk factors or features associated with conditions such as emphysema, coronary artery calcification, and chronic obstructive pulmonary disorder (COPD)
- Combine image features with prescreening risk factors
- Develop lung cancer risk calculator

Collaborators

- Hormuzd Katki, PhD (Principal Investigator), NIH
- Christine Berg, MD, NIH
- Anil Chaturvedi, PhD, NIH
- Wes Caldwell, PhD, NIH
- Rebecca Landy, PhD, NIH
- Ronald M. Summers, MD, PhD, NIH
- Li Cheung, PhD, NIH