Single-molecule nanopore-based identification of methylome signatures in cfDNA for early colorectal cancer detection
Malignant tumor cells shed DNA into the bloodstream of cancer patients as cfDNA, commonly in the form of nucleosome-sized fragments. Epigenetic modifications such as 5-methylcytosine (5mC), are of particular interest because of their contribution to cancer development and progression. There is broad interest in cfDNA methylation as a cancer biomarker modality, ranging from targeted biomarker panels to whole-genome characterization of cfDNA methylomes. For detecting 5mC methylation, cfDNA is currently processed with bisulfite or enzymatic conversion of unmodified cytosines into uracils, which is detected by short-read sequencers. This approach introduces biases such as GC skews, oxidative DNA damage, PCR amplification bias, and alignment artifacts. Characterizing cfDNA methylomes from patients remains challenging, particularly with conventional sequencing approaches.
We recently demonstrated a novel approach for single-molecule methylation analysis of cfDNA that overcomes these issues. We developed a nanopore-based sequencing approach for efficiently characterizing methylation profiles from the cfDNA of cancer patients (Lau, et al. Genome Medicine 2023; in press). The passage of methylated DNA through the nanopore generates a unique electrical signal compared to unmodified DNA, which can then be classified with machine learning algorithms. We generated up to hundreds of millions of reads per cfDNA sample from colorectal cancer patients, with single nanograms or less of analyte per patient. We discovered millions of CpG sites that were differentially methylated in cfDNA between patients with colorectal cancer and healthy controls. Extending from this work, we have to date sequenced almost 1,000 cell-free DNA samples as preliminary work for developing classifier models for cancer detection.
In this project, we seek to extend our work to (1) broadly characterize the early-stage colorectal cancer cfDNA methylome landscape, and (2) to develop a classification model for early detection of colorectal cancer. Using cfDNA samples from patients at the Stanford Cancer Institute (SCI) and the PLCO clinical trial, we will refine whole-genome cfDNA methylomes that will form the basis of biomarker signatures for early colorectal cancer detection. In Aim 1, we will derive cfDNA methylome profiles of early-early stage and pre-diagnostic colorectal cancer. Here, we will build out a whole-genome cfDNA methylome resource from colorectal cancer patients from the SCI and the PLCO trial cohort. In Aim 2, using sequenced cfDNA we will build a machine learning model that will determine statistically significant changes in cfDNA methylation between cancer patients and healthy controls. The machine learning model will also quantify tumor burden and whether it is likely that a sample is indicative of cancer. We will use the SCI and PLCO patients as independent cohorts to perform training and validation.
Specific Aim 1. cfDNA methylome characterization of early-stage and pre-diagnostic colorectal cancer. Applying our single-molecule sequencing assay for epigenetic characterization, we will measure the methylomes of pre-diagnostic and early-stage cfDNA to discover novel biomarkers. We will use two independent cohorts: early-stage colorectal cancer patients from the Stanford Cancer Institute, as well as those from the colorectal cancer cohort from the PLCO clinical trial. We will process cfDNA from plasma in both cohorts and perform high-throughput nanopore sequencing, generating comprehensive methylome profiles stratified by cancer stage and subtype. To further refine cancer-specific methylome signatures in cfDNA, we will also determine the methylomes of matched primary tumors and peripheral blood from Stanford Cancer Institute patients, and perform bisulfite sequencing of archived FFPE tumors from the PLCO cohort. Quantitative metrics for the success of this aim include measuring (1) the overall sequencing yield per cfDNA sample, (2) percentage of CpG sites covered, and (3) overall concordance of cfDNA methylome landscapes with sequenced tissue and external datasets (e.g. TCGA).
Specific Aim 2. Development of a classification model for early detection of colorectal cancer via liquid biopsy. Leveraging biomarkers discovered through cohort sequencing, we will utilize machine learning approaches to create a classification model for early-stage colorectal cancer detection. The classification models will use collections of individual CpG sites as biomarkers for cancer classification. As we will be using two independent cohorts (Stanford Cancer Institute and PLCO), these will be used as mutual test and validation datasets for model assessment. Classification models will undergo a round of feature prioritization by statistical testing, followed by tree-based machine learning models. Through the PLCO cohort, we will also utilize longitudinal pre-diagnostic cfDNA to determine the time before clinical diagnosis for which our model can detect cancer. Quantitative metrics for the success of this aim include measuring (1) the sensitivity, specificity, and accuracy of the classification model, (2) the pre-diagnosis time period at which the classification model can successfully make a cancer determination (PLCO cohort).
Hanlee Ji (LELAND STANFORD JUNIOR UNIVERSITY, THE)
Billy Lau (LELAND STANFORD JUNIOR UNIVERSITY, THE)