Machine learning identifies cell-free DNA 5-hydroxymethylation biomarkers that detect occult colorectal cancer in PLCO Screening Trial subjects.
BACKGROUND: Colorectal cancer (CRC) is a leading cause of cancer-related mortality, and CRC detection through screening improves survival rates. A promising avenue to improve patient screening compliance is the development of minimally-invasive liquid biopsy assays that target CRC biomarkers on circulating cell-free DNA (cfDNA) in peripheral plasma. In this report, we identify cfDNA biomarker candidate genes bearing the epigenetic mark 5-hydroxymethylcytosine (5hmC) that diagnose occult CRC up to 36 months prior to clinical diagnosis using the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial samples.
METHODS: Archived PLCO Trial plasma samples containing cfDNA were obtained from the National Cancer Institute (NCI) biorepositories. Study subjects included those who were diagnosed with CRC within 36 months of blood collection (i.e., case, n = 201) and those who were not diagnosed with any cancer during an average of 16.3 years of follow-up (i.e., controls, n = 402). Following the extraction of 3 - 8 ng cfDNA from less than 300 microliters plasma, we employed the sensitive 5hmC-Seal chemical labeling approach, followed by next-generation sequencing (NGS). We then conducted association studies and machine-learning modeling to analyze the genome-wide 5hmC profiles within training and validation groups that were randomly selected at a 2:1 ratio.
RESULTS: Despite the technical challenges associated with the PLCO samples (e.g., limited plasma volumes, low cfDNA amounts, and long archival times), robust genome-wide 5hmC profiles were successfully obtained from these samples. Association analyses using the Cox proportional hazards models suggested several epigenetic pathways relevant to CRC development distinguishing cases from controls. A weighted Cox model, comprised of 32-associated gene bodies, showed predictive detection value for CRC as early as 24-36 months prior to overt tumor presentation, and a trend for increased predictive power was observed for blood samples collected closer to CRC diagnosis. Notably, the 5hmC-based predictive model showed comparable performance regardless of sex and self-reported race/ethnicity, and significantly outperformed risk factors such as age and obesity according to BMI (body mass index). Additionally, further improvement of predictive performance was achieved by combining the 5hmC-based model and risk factors for CRC.
CONCLUSIONS: An assay of 5hmC epigenetic signals on cfDNA revealed candidate biomarkers with the potential to predict CRC occurrence despite the absence of clinical symptoms or the availability of effective predictors. Developing a minimally-invasive clinical assay that detects 5hmC-modified biomarkers holds promise for improving early CRC detection and ultimately patient survival through higher compliance screening and earlier intervention. Future investigation to expand this strategy to prospectively collected samples is warranted.