Exploring the Relationship Between Lifestyle Factors and Early-Stage Lung Cancer Using Publicly Available U.S. Data
The methodology involves utilizing publicly available datasets such as TCGA, SEER, and NLST to extract relevant variables including age, gender, race/ethnicity, smoking history, dietary habits, physical activity levels, and more. The project involves data cleaning, preprocessing, and applying a combination of statistical analyses and machine learning techniques, including logistic regression, random forest models, and various statistical tests which will be used to assess correlations and identify significant lifestyle factors associated with early-stage lung cancer and provide recommendations for targeted public health interventions or early preventions.
Key expected contributions include:
- Identification of the top 3-5 lifestyle factors most strongly associated with early-stage lung cancer diagnosis.
- A ranked list of lifestyle factors, including potentially underestimated ones, based on their correlation with early-stage lung cancer.
- Insights into how the importance of different lifestyle factors varies between early-stage and late-stage diagnoses.
- Recommendations for targeted public health interventions based on the most significant lifestyle factors identified.
Future Directions:
- Multi-Cancer Analysis: Apply the methodology developed in this project to other types of cancer, exploring similarities and differences in lifestyle risk factors across various cancer types.
- Integration with Genetic Data: Combine lifestyle factor analysis with genetic risk factors to create more comprehensive predictive models for early-stage lung cancer.
- Intervention Studies: Design and implement targeted intervention studies based on the identified high-risk lifestyle factors to assess their effectiveness in reducing early-stage lung cancer incidence.
- Machine Learning Enhancements: Explore more advanced machine learning techniques, such as deep learning or ensemble methods, to improve the accuracy of predictive models.
Advisor: Darren Erisman - Minot State University
Nursing Faculties - Minot State University
Hospital Staff - Minot State Trinity