Implication of new models for predicting lung cancer screening selection based on CT scan and pathology image data
Different risk models select different populations for screening. The model performance varies based on the selection criteria. Building on top of the existing models, we propose the new lung cancer ever-smoker screening model based on machine learning strategies to better help the health workers and reduce lung cancer mortality rate. Our model takes the CT scan and pathology image data from NLST and PLCO databases. With enhanced feature selection and engineering, we are expecting better performance by comparing with existing models.
1. Data exploration and existing model validation
We explore the PLCO database based on different features, like gender, age, ethnicities, etc. The initial data visualization will be performed using R and Python. The existing model performance on this dataset is also evaluated.
2. Implication of new models based on machine learning
We will propose new models based on machine learning and feature engineering. The aim is to have better prediction accuracy and lower the lung cancer mortality after the screening.
3. Model evaluation and comparison
Our model will be evaluated using the common evaluation metrics. We shall plot the comparison between our model performance and others.
Ning Zhang, Northwestern University
Chengsheng Mao, Northwestern University
Yiming Li, Northwestern University
Yawei Li, Northwestern University
Saya Dennis, Northwestern University
Garrett Eickelberg, Northwestern University
Meghan Hutch, Northwestern University
Yikuan Li, Northwestern University
Hanyin Wang, Northwestern University