Skip to Main Content
An official website of the United States government

Lung Cancer Early Detection Challenge: Concept to Clinic

Principal Investigator

Name
Isaac Slavitt

Degrees
S.M. Computational Science

Institution
DrivenData

Position Title
Data Scientist

Email
isaac@drivendata.org

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-317

Initial CDAS Request Approval
Jun 20, 2017

Title
Lung Cancer Early Detection Challenge: Concept to Clinic

Summary
Recent research on the role artificial intelligence in lung cancer diagnosis has shown promising results. These techniques were recently featured in the Data Science Bowl data challenge to produce open source algorithms for detecting cancer risk in CT scans. In the hands of radiologists, these algorithms can significantly reduce the false positive rate that currently plagues early screening and ultimately improve our ability to effectively diagnose lung cancer at earlier stages.

However, there is still a wide gap between research and practice. In order to make this research usable, we’ll be running an online software development challenge which aims to translate the early stage algorithms into an open source software tool that clinicians can actually use to evaluate patients. The Addario Lung Cancer Foundation is putting up $100,000 in prizes for top contributors and partnering with a team of data scientists from DrivenData to design and run the challenge starting this summer.

We would like to use the NLST data as training data for the machine learning code in this project, just as it was used for the Data Science Bowl. We would like to make this data available to contributors working on training models for the duration of the challenge.

Aims

This project aims to bring experimental code for detecting lung cancer from CT scans closer to clinical application. The targeted output will be a well-designed, open-source, modern software package designed to complement and enhance the tools that clinicians already use. At a minimum, this should include:

- Consuming imagery produced in the normal course of radiology, e.g. DICOM formatted CT images;
- Processing the images with a tuned and productionized version of one or many of the best predictive algorithms, in order to get predictions for the presence of abnormal tissue, and potentially flag certain subregions of the image for further review if enabled by the predictive algorithms;
- A network based service that, given input images, will output results from the predictive algorithms in a machine-readable format;
- An accessible graphical user interface, likely web-based, that will display the results in a user-friendly and interpretable way for radiologists to use in assessing the presence of lung cancer.

Ultimately the goals is to change the way that humans and machines work together for lung cancer diagnosis in practice, and to accelerate the detection of concerning lesions with a dramatically lower false positive rate so that patients can manage their cancer sooner and live longer, healthier lives.

Collaborators

Greg Lipstein, DrivenData
Peter Bull, DrivenData
Charles Hornbaker, DrivenData
David LeDuc, Bonnie J. Addario Lung Cancer Foundation
Andrea Parks, Bonnie J. Addario Lung Cancer Foundation
Samantha Cummis, Bonnie J. Addario Lung Cancer Foundation