Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Sendhil Mullainathan
Degrees
PhD
Institution
University of Chicago
Position Title
Roman Family University Professor of Computation and Behavioral Science
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-560
Initial CDAS Request Approval
Sep 6, 2019
Title
Predicting Progression of Lung Lesions on CT scans
Summary
We are interested in the progression of lung lesions on CT scans in the National Cancer Screening Trial.
Using time series data and machine learning techniques, we will try to predict from subsequent images how the prior image will progress. We are particularly focused on data that might be cancer and those that might be false positives for cancer. Our goal in studying this progression is to understand how these lesions evolve over time – which ones are dangerous and which are not. Applying machine learning to these data is an interesting way to understand such a trajectory, and can possibly help accelerate detection of patterns and characteristics that lead to dangerous lesions.

Specifically, our team has experience in unsupervised and generative learning models. We plan to train an algorithm to encode CT images and, through the model’s neural network, translate those images into a set of variables – and then those variables into a compressed image. Because we are going to be using a generative model, it’s important for us to have as many positive cancer instances as possible, as positive examples are effective at training such models. Given the need for the number of positive examples, we will be requesting 15,000 patient images. While we recognize this is the limit of data available per project we believe this amount of data will add maximum value to our project. An abundance of samples will also help us eventually develop a supervised model that will train other medical datasets beyond cancer.

For all of the connected data we receive we would like all available associated outcomes that we can obtain.
Aims

-Using 15,000 patient images, train a generative model to predict how lung lesions will progress
-Understand evolution, patterns, and characteristics of dangerous and benign lung lesions
-Using these patient samples, begin developing a supervised model that can train broader range of medical datasets

Collaborators

Dr. Aytek Oto, University of Chicago Department of Radiology