Predicting lung cancer recurrence with machine learning
Principal Investigator
Name
Flavio Calmon
Degrees
PhD
Institution
Harvard University
Position Title
Assistant Professor of Electrical Engineering
Email
About this CDAS Project
Study
NLST
(Learn more about this study)
Project ID
NLST-567
Initial CDAS Request Approval
Sep 10, 2019
Title
Predicting lung cancer recurrence with machine learning
Summary
My project will focus on developing a machine learning model to predict incidents of recurrence in lung cancer patients following tumor resection. The ultimate goal is to develop a model that can help evaluate whether or not to administer adjuvant therapy, guide levels of imaging surveillance, and advise any other post-operative treatment decisions.
The methodology will consist of two key components – image processing and a socio-demographic cohort analysis. With the recent proliferation of open source image processing libraries, we will first create and automatized process to extract tumor features from CT, X-Ray and H&E images. While there have been several studies that have attempted a similar approach, most have been done on limited datasets and have lacked proper validation. Moreover, there has been minimal work in this area that attempts to integrate other types of non-imaging data. Therefore, we will then use sociodemographic and clinical data to do a cohort analysis on different subsets of patients. We hope to gain insight into whether or not the predictions of the model vary by sample population, and also whether or not the model’s accuracy is affected by sample population. From there, some of the sociodemographic and clinical features will be selected to add to the final model.
The final deliverable of the project will be a thesis paper finished in March in 2020. Publications will be sought after completion.
The methodology will consist of two key components – image processing and a socio-demographic cohort analysis. With the recent proliferation of open source image processing libraries, we will first create and automatized process to extract tumor features from CT, X-Ray and H&E images. While there have been several studies that have attempted a similar approach, most have been done on limited datasets and have lacked proper validation. Moreover, there has been minimal work in this area that attempts to integrate other types of non-imaging data. Therefore, we will then use sociodemographic and clinical data to do a cohort analysis on different subsets of patients. We hope to gain insight into whether or not the predictions of the model vary by sample population, and also whether or not the model’s accuracy is affected by sample population. From there, some of the sociodemographic and clinical features will be selected to add to the final model.
The final deliverable of the project will be a thesis paper finished in March in 2020. Publications will be sought after completion.
Aims
The following is a list of specific aims:
1. A robust and well tested model that can reliably predict cancer recurrence in lung cancer patients and advise post-operative treatment decisions
2. Insights into patterns of recurrence and model performance by cohort
3. Publication of the completed thesis
Collaborators
Flavio P. Calmon, PhD, Harvard University