Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Principal Investigator
Name
Min Zhang
Degrees
M.D., Ph.D.
Institution
University of California, Irvine
Position Title
PI
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1982
Initial CDAS Request Approval
Sep 9, 2025
Title
Enhancing colorectal cancer risk prediction through AI-augmented variable section and stratification
Summary
Colorectal cancer ranks as the third leading cause of cancer death among both men and women in the United States. According to estimates from the American Cancer Society, there will be 154,000 new cases of colorectal cancer diagnosed in 2025. In addition to genetic factors, demographic, lifestyle, and other related factors also influence the disease risk. Leveraging the patient-level datasets from the PLCO study, we propose to apply our custom AI-powered variable selection method to identify key factors and then build a comprehensive statistical model for CRC risk prediction and stratification. The prediction accuracy of the model will be evaluated using XGBoost and other AI techniques.
Aims

To improve our understanding of the relationship between different stages of CRC and demographic, clinical, and other related variables, we propose the following specific aims. Aim 1 focuses on variable selection. Specifically, we will investigate demographic and clinical variables, as well as other variables in the PLCO dataset, and select the important ones based on their main effects and pairwise interactions associated with CRC risk using our in-house AI-driven method for variable selection, penalized orthogonal components regression. Based on the selected variables, we will evaluate the prediction accuracy of the model with XGBoost and other AI techniques in Aim 2, and compare the results with other popular models.

Collaborators

Min Zhang University of California, Irvine
Dabao Zhang University of California, Irvine
Danni Liu University of California, Irvine
Zhongli Jiang University of California, Irvine