Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Isaac Slavitt
Degrees
S.M. Computational Science
Institution
DrivenData
Position Title
Data Scientist
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-317
Initial CDAS Request Approval
Jun 20, 2017
Title
Lung Cancer Early Detection Challenge: Concept to Clinic
Summary
Recent research on the role artificial intelligence in lung cancer diagnosis has shown promising results. These techniques were recently featured in the Data Science Bowl data challenge to produce open source algorithms for detecting cancer risk in CT scans. In the hands of radiologists, these algorithms can significantly reduce the false positive rate that currently plagues early screening and ultimately improve our ability to effectively diagnose lung cancer at earlier stages.

However, there is still a wide gap between research and practice. In order to make this research usable, we’ll be running an online software development challenge which aims to translate the early stage algorithms into an open source software tool that clinicians can actually use to evaluate patients. The Addario Lung Cancer Foundation is putting up $100,000 in prizes for top contributors and partnering with a team of data scientists from DrivenData to design and run the challenge starting this summer.

We would like to use the NLST data as training data for the machine learning code in this project, just as it was used for the Data Science Bowl. We would like to make this data available to contributors working on training models for the duration of the challenge.
Aims

This project aims to bring experimental code for detecting lung cancer from CT scans closer to clinical application. The targeted output will be a well-designed, open-source, modern software package designed to complement and enhance the tools that clinicians already use. At a minimum, this should include:

- Consuming imagery produced in the normal course of radiology, e.g. DICOM formatted CT images;
- Processing the images with a tuned and productionized version of one or many of the best predictive algorithms, in order to get predictions for the presence of abnormal tissue, and potentially flag certain subregions of the image for further review if enabled by the predictive algorithms;
- A network based service that, given input images, will output results from the predictive algorithms in a machine-readable format;
- An accessible graphical user interface, likely web-based, that will display the results in a user-friendly and interpretable way for radiologists to use in assessing the presence of lung cancer.

Ultimately the goals is to change the way that humans and machines work together for lung cancer diagnosis in practice, and to accelerate the detection of concerning lesions with a dramatically lower false positive rate so that patients can manage their cancer sooner and live longer, healthier lives.

Collaborators

Greg Lipstein, DrivenData
Peter Bull, DrivenData
Charles Hornbaker, DrivenData
David LeDuc, Bonnie J. Addario Lung Cancer Foundation
Andrea Parks, Bonnie J. Addario Lung Cancer Foundation
Samantha Cummis, Bonnie J. Addario Lung Cancer Foundation