Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Yougen Wu
Degrees
Ph.D.
Institution
The Fifth People's Hospital of Shanghai, Fudan University
Position Title
Scientist
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-852
Initial CDAS Request Approval
Nov 3, 2021
Title
Machine-Learning and classic statistic models for predicting cancers risk and clinical outcomes
Summary
Cancer is a major public health problem around the world. A number of factors, such as lifestyle behaviors (e.g. diet, physical activity, smoking, stress, etc.), environmental factors, and genetic factors have impacts on cancer incidence and mortality, the strength behind these associations and optimal levels for risk factors related to cancers risk and clinical outcomes remain unclear. Clinical prediction models for cancer incidence and mortality provide risk estimates for individual patients. Clinical prediction models may combine multiple predictors to provide insight into the relative effects of predictors in the model and are becoming more frequent. Cancer-related events prediction and risk stratification are critically important for making clinical decisions regarding cancers prevention and intervention. However, it is still controversial that which models among machine learning algorithms and conventional modeling can achieve better predictive performance, particularly in terms of time-to-event prediction. The primary aim of this proposal is to identify the important predictors influencing multiple clinical outcomes such as all cancers incidence, recurrence, death from cancer, and all-cause mortality. We have developed risk prediction models for cancers based on UK Biobank resource but the models lack external validation. We will validate the predictive performance of models by using PLCO data.

We wish to request 17 cancers sites datasets. These consist of cancer of prostate, lung, colorectal, ovarian, head and neck, pancreas, upper GI, liver, biliary, melanoma, breast, endometrial, bladder, renal, glioma, thyroid, and hematopoietic. All the available data will be requested from all participants including baseline questionnaire (BQ), dietary questionnaire (DQX), diet history questionnaire (DHQ), supplemental questionnaire (SQX), medication use questionnaire (MUQ), brief survey questionnaire (BCQ), screening, diagnostic procedures, cancer diagnosis, treatment, mortality, ancillary studies data (benign ovarian pathology datasets, free PSA datasets, SCU datasets, contamination survey datasets), lab results datasets, ovarian biomarkers datasets, Vitamin D datasets, sex hormones datasets, and multiplex immune marker panel studies datasets. Taken together, comprehensive personalized risk prediction models for cancers based on baseline blood biomarkers, lifestyle behaviors, environmental factors, genetic factors, and routinely available data such as medical history, comorbidities, and medications are urgently needed.
Aims

Aim 1: Develop and validate a set of risk prediction models for 17 cancer sites with large data set from UK Biobank and PLCO.
Aim 2: Compare performances of different prediction models.
Aim 3: Identify highly related risk factors for all cancers, explore the potential interaction between predictors, and stratify high risk population.

Collaborators

NA