Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Min Zhang
M.D., Ph.D.,
Purdue University
Position Title
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Dec 2, 2020
Genome-wide Association Study of Colorectal Cancer Using Penalized Orthogonal Components Regression
Colorectal cancer (CRC) accounts for the fourth-most cancer deaths worldwide, with a large proportion of risk attributable to genetic factors. Large-scale CRC consortiums, such as PLCO, have greatly proliferated understanding of the genetic architecture of the disease by identifying genetic loci affiliated with CRC risk. Nevertheless, such large genotypic datasets have presented an abundance of statistical and computational challenges. For instance, identifying a sparse subset of risk loci from millions of markers requires careful consideration of statistical power due to the burden of multiple testing correction. The underlying structure of genetic data, such as linkage disequilibrium and epistasis, exacerbates this issue. Furthermore, in consortiums such as PLCO with many thousands of individuals, the computational complexity of statistical methods is a large barrier to overcome. To address such challenges in -omics datasets, we have proposed an efficient variable selection procedure, namely penalized orthogonal-components regression (POCRE). POCRE utilizes supervised dimension reduction to simultaneously test for associations between phenotypes and genomic markers across the entire genome. Weighted groups of predictors are constructed based on their strength of association with the phenotype of interest as well as their correlation between other markers. Our previous work on other -omics datasets have demonstrated that POCRE is computationally efficient and can identify markers shared by multiple traits, but there remains a knowledge gap in its application to genetic data. Access to PLCO data will facilitate our research goal due to its large cohort of individuals and a huge collection of demographic and clinical variables. Using POCRE, we will investigate the risk factors of CRC based on the genotype and clinical information involved in PLCO data.

To enable deeper understanding of the relationship between CRC and genomic information as well as other clinical/demographic variables, we have developed the following specific aims:

Aim 1: To identify the genomic risk factors for CRC using POCRE;
Aim 2: To investigate the effects of gene-environment interactions on CRC;
Aim 3: To evaluate the prediction accuracy of CRC based on the statistical model constructed in the first two aims.


Dabao Zhang (Associate Professor), Purdue University;
Christopher Bryan (PhD student), Purdue University