Comprehensive Analysis of Risk Factors and Predictive Modeling for Different Cancer types in the PLCO Dataset
Through advanced statistical and machine learning methods, the research aims to uncover patterns and correlations that provide deeper insights into cancer risk and progression. By evaluating sensitivity, specificity, and receiver operating characteristic curves adjusted for baseline covariates, this work will help the design of targeted cancer prevention strategies and optimize screening protocols.
1. Identify Key Predictors of Cancer Risk:
Assess correlations between clinical/epidemiological variables and the occurrence of cancer.
Determine the most influential predictors for cancers at different diagnostic time horizons (e.g., within 1 year, 2 years).
2. Develop Multivariate Risk Prediction Models
Construct predictive models using logistic regression, random forests, or other machine learning methods.
Quantify model performance using metrics such as AUC, sensitivity, and specificity, ensuring robustness via cross-validation.
3. Evaluate the Performance of Screening Strategies
Estimate metrics like lead time, sojourn time, and over-diagnosis rates for specific cancer type.
Analyze screening effectiveness across demographic subgroups, providing insights into disparities in outcomes.
4. Inform Future Cancer Prevention and Screening Design
Provide actionable recommendations for optimizing screening and prevention strategies based on study findings.
Dr. Piyush Samant, Data Scientist
Dr. Cheng He, VP R&D