Skip to Main Content
Principal Investigator
Ulrike Peters
Fred Hutchinson Cancer Research Center
Position Title
About this Project
PLCO (Learn more about this study)
Project ID
Screen for Rare Alleles by Deep Resequencing of Colorectal Cancer Cases

PLEASE NOTE FOR THE PURPOSE OF READABILITY AND EASY ACCESS TO TABLES AND FIGURES THE ENTIRE APPLICATION IS ALSO ATTACHED AS AN ATTACHMENT: Linkage analyses of pedigrees have identified idiosyncratic, high penetrance mutations predisposing to colorectal cancer (CRC), and more recently genome-wide association studies (GWAS) have identified common risk alleles that confer modest risk of developing disease. However, a significant fraction of the excess familial risk of CRC remains to be explained. The approaches used to date would not have been likely to detect low frequency alleles (allele frequency between one half percent and five percent) associated with moderate disease risk (relative risk of disease in carriers between 2 and 4). To investigate these genetic variants we propose to use next generation sequencing technology to screen the protein coding regions of the whole genome ("the exome') for low frequency polymorphisms in a panel of 75 CRC cases, identified from the Women's Health Initiative (WHI) and Prostate, Lung, Colon and Ovarian Cancer Screening Trial (PLCO). The identified low frequency polymorphisms will then be screened for predicted molecular functions, such as nonsense, frameshift, missense and splicing polymorphisms and the most promising candidate functional polymorphisms will be tested for association with disease status by genotyping 394 strong candidates in a total of 3,493 matched case-control pairs from the WHI and PLCO studies.These variants may provide meaningful disease risk prediction in the clinic, and are certain to provide insights into the molecular pathways through which CRC develops. Insights into these molecular pathways may allow for the development of novel therapeutic (and prophylactic) approaches to CRC.


Colorectal cancer (CRC) is the second leading cause of cancer death in the . It is estimated that up to 35% of CRC is attributable to inherited factors, and identification of associated genetic variants is important to elucidate mechanisms underlying this disease. First results from genome-wide association studies (GWAS) have demonstrated considerable success in identifying genetic variants associated with various common complex diseases, including CRC. Rare syndromes (FAP, Lynch's syndrome, etc.) explain between three and five percent of excess familial disease risk for CRC, while GWAS regions identified to date explain just six percent. It is estimated that even large GWAS using 50,000 to 100,000 individuals will capture at most 10% to 15% of the heritable disease risk of complex diseases, leaving a significant fraction unexplained. This variation has been referred to as "genetic dark matter", and its elucidation is a key next step in understanding genetic susceptibility to CRC.US. There are several possibilities that might account for the unexplained heritability of CRC, including gene-gene interactions, gene-environment interactions and copy number variation. Another promising explanation is variants that would not have been captured by either linkage analysis or GWAS; specifically, modest risk variants with allele frequencies in the 0.5 - 5% range. Recent studies examining variation in the coding regions of the genome have identified a number of functional variants in this frequency range contributing to complex disease risk. In contrast to subtle tagSNP associations in GWAS, rare disease-associated variants often have stronger associations, and are more likely to be functionally related to disease. Initial studies of rare variants have focused on the resequencing of candidate genes and regions. Next generation sequencing methods and other technological advances have now made it feasible and cost-efficient to resequence much of the coding genome (the "exome") and, thereby, to comprehensively screen for coding variants that can be tested for disease risk. Such an analysis was recently successful in identifying rare variants in PALB2 that are associated with pancreatic cancer. Ultimately, a full understanding of the role of rare alleles in complex disease will require an examination of both coding and non-coding variants, since both are likely to contribute to disease susceptibility. However, an exomic focus is currently advantageous for three reasons: (1) mutations in coding regions that significantly alter the structure of key proteins (e.g. nonsense and frameshift mutations), are likely to have relevant functional effects; (2) in silico tools to predict function are more mature for missense variants and splice variants in coding regions than for regulatory variants in noncoding sequence; and (3) sequencing the exome is presently much cheaper than sequencing the entire genome. Exomic resequencing of a modest panel of high-risk individuals can be paired with in silico analysis to define a panel of coding variants with strong functional predictions. These variants can rapidly and efficiently be screened for associations with disease in much larger case-control populations. This strategy affords an exceptional opportunity to identify rare functional polymorphisms associated with CRC that have clinically relevant odds ratios (?2-fold allelic OR). We propose a strategy with the following specific aims: Aim 1: To identify rare variants (MAF 0.5-5%) in the exome of colorectal cancer cases. We propose to use a combination of hybridization enrichment of the exome with next-generation sequencing to sequence the exome to 25X depth in each of 75 high risk CRC cases. Aim 2: To characterize these variants in terms of predicted function and to test the most promising variants for association with CRC. We propose to identify rare variants with strong predictions of functional consequence (nonsense, frameshift, and non-conservative mis-sense variants, as well as variants in candidate genes and pathways), and genotype a panel of 384 candidate functional variants in 3,493 CRC cases and 3,493 matched controls from the WHI and PLCO, to test for associations with CRC risk. As a secondary aim, we propose to assess the feasibility of resequencing pools of genomes from the same individuals, as a more cost-effective approach to rare variant discovery. This project brings together a highly-qualified multi-disciplinary team of investigators, using cutting edge next-generation sequencing technology to generate important sequence and association data. Our work will result in methodologic and technological advances, which will accelerate current and future research into the genetic basis of CRC and other complex diseases. This work should lead to the identification of key causal variants, genes, and pathways for CRC, and, ultimately, to improved public health prevention and treatment strategies.


Richard Hayes (NYU)
David Craig (Translational Genomic Science)
Ulrike Peters (Fred Hutchinson Cancer Research Center)
John Potter (Fred Hutchinson Cancer Research Center)
Li Hsu (Fred Hutchinson Cancer Research Center)
David Duggan (Translational Genomic Science)
Wen-Yi Huang (NCI)

Related CDAS Publications
Approved Addenda This project has one or more approved addenda.
  • Extend targeted sequencing to adenomas (already in the file under EEMS 2007-0022) and change sequencing lab to CIDR