Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Sendhil Mullainathan
University of Chicago
Position Title
Roman Family University Professor of Computation and Behavioral Science
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Sep 6, 2019
Predicting Progression of Lung Lesions on CT scans
We are interested in the progression of lung lesions on CT scans in the National Cancer Screening Trial.
Using time series data and machine learning techniques, we will try to predict from subsequent images how the prior image will progress. We are particularly focused on data that might be cancer and those that might be false positives for cancer. Our goal in studying this progression is to understand how these lesions evolve over time – which ones are dangerous and which are not. Applying machine learning to these data is an interesting way to understand such a trajectory, and can possibly help accelerate detection of patterns and characteristics that lead to dangerous lesions.

Specifically, our team has experience in unsupervised and generative learning models. We plan to train an algorithm to encode CT images and, through the model’s neural network, translate those images into a set of variables – and then those variables into a compressed image. Because we are going to be using a generative model, it’s important for us to have as many positive cancer instances as possible, as positive examples are effective at training such models. Given the need for the number of positive examples, we will be requesting 15,000 patient images. While we recognize this is the limit of data available per project we believe this amount of data will add maximum value to our project. An abundance of samples will also help us eventually develop a supervised model that will train other medical datasets beyond cancer.

For all of the connected data we receive we would like all available associated outcomes that we can obtain.

-Using 15,000 patient images, train a generative model to predict how lung lesions will progress
-Understand evolution, patterns, and characteristics of dangerous and benign lung lesions
-Using these patient samples, begin developing a supervised model that can train broader range of medical datasets


Dr. Aytek Oto, University of Chicago Department of Radiology