Predicting lung cancer recurrence with machine learning
The methodology will consist of two key components – image processing and a socio-demographic cohort analysis. With the recent proliferation of open source image processing libraries, we will first create and automatized process to extract tumor features from CT, X-Ray and H&E images. While there have been several studies that have attempted a similar approach, most have been done on limited datasets and have lacked proper validation. Moreover, there has been minimal work in this area that attempts to integrate other types of non-imaging data. Therefore, we will then use sociodemographic and clinical data to do a cohort analysis on different subsets of patients. We hope to gain insight into whether or not the predictions of the model vary by sample population, and also whether or not the model’s accuracy is affected by sample population. From there, some of the sociodemographic and clinical features will be selected to add to the final model.
The final deliverable of the project will be a thesis paper finished in March in 2020. Publications will be sought after completion.
The following is a list of specific aims:
1. A robust and well tested model that can reliably predict cancer recurrence in lung cancer patients and advise post-operative treatment decisions
2. Insights into patterns of recurrence and model performance by cohort
3. Publication of the completed thesis
Flavio P. Calmon, PhD, Harvard University