Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Stephen (Jihao) Tao
Degrees
Undergraduate - Senior
Institution
Minot State University
Position Title
Data Science and Finance Major
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-1321
Initial CDAS Request Approval
Sep 16, 2024
Title
Exploring the Relationship Between Lifestyle Factors and Early-Stage Lung Cancer Using Publicly Available U.S. Data
Summary
The primary objective of this project is to analyze the relationship between specific lifestyle factors and early-stage lung cancer diagnosis using publicly available datasets in the United States. The project is looking to identify and rank lifestyle factors most strongly associated with early-stage lung cancer, with a focus on potentially underestimated or overlooked factors.
The methodology involves utilizing publicly available datasets such as TCGA, SEER, and NLST to extract relevant variables including age, gender, race/ethnicity, smoking history, dietary habits, physical activity levels, and more. The project involves data cleaning, preprocessing, and applying a combination of statistical analyses and machine learning techniques, including logistic regression, random forest models, and various statistical tests which will be used to assess correlations and identify significant lifestyle factors associated with early-stage lung cancer and provide recommendations for targeted public health interventions or early preventions.
Aims

Key expected contributions include:
- Identification of the top 3-5 lifestyle factors most strongly associated with early-stage lung cancer diagnosis.
- A ranked list of lifestyle factors, including potentially underestimated ones, based on their correlation with early-stage lung cancer.
- Insights into how the importance of different lifestyle factors varies between early-stage and late-stage diagnoses.
- Recommendations for targeted public health interventions based on the most significant lifestyle factors identified.
Future Directions:
- Multi-Cancer Analysis: Apply the methodology developed in this project to other types of cancer, exploring similarities and differences in lifestyle risk factors across various cancer types.
- Integration with Genetic Data: Combine lifestyle factor analysis with genetic risk factors to create more comprehensive predictive models for early-stage lung cancer.
- Intervention Studies: Design and implement targeted intervention studies based on the identified high-risk lifestyle factors to assess their effectiveness in reducing early-stage lung cancer incidence.
- Machine Learning Enhancements: Explore more advanced machine learning techniques, such as deep learning or ensemble methods, to improve the accuracy of predictive models.

Collaborators

Advisor: Darren Erisman - Minot State University
Nursing Faculties - Minot State University
Hospital Staff - Minot State Trinity