Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
michael roberts
Ph.D., MMath
Position Title
Postdoctoral fellow
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Mar 10, 2020
Autoencoder for CT images and automatic clustering of patients
We have developed an autoencoder to encode CT images into a feature vector. The feature vector is of significantly lower dimension (vector of length ~4000) than a CT image (~200 million voxels). We would like to utilise the National Lung Cancer Screening Trial data to better train the autoencoder – as this resource is the largest available and also has a great variety of CT scanners. We believe this will have great generalisability and be of great use to the research community, allowing an encoding of entire datasets of CT images into small feature vectors for (1) machine learning analysis to classify patients by outcome (2) automated clustering of patients by features. Clustering based on encoded features should give outputs of great clinical utility, for example adverse event prediction using the encoded features. Nothing like this currently exists and we have the opportunity to create it and release the model publicly for researchers.

> Train an autoencoder using the data which will be highly generalisable to new CT images.

> Using the trained autoencoder on new datasets we aim to show that the encoded features can be used for classifying patients into different outcome groups. We also aim to predict adverse events.


Andrew Reynolds, Astrazeneca
Mishal Patel, Astrazeneca
Tom White, Astrazeneca