Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Anish Bhandari
Degrees
M.S
Institution
Southern Methodist University (SMU)
Position Title
Data Science Student
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1677
Initial CDAS Request Approval
Sep 30, 2024
Title
Geospatial and Socioeconomic Analysis of Young-Onset Colorectal Cancer in the United States
Summary
We are a team of three Master of Data Science students at Southern Methodist University (SMU) investigating the growing concern of young-onset colorectal cancer (diagnoses in individuals under 50 years of age) in the United States. Despite declining overall rates of colorectal cancer, cases in younger populations are rising at an alarming rate. This phenomenon remains poorly understood, as experts have not yet pinpointed the underlying causes.

Our project seeks to examine the potential influence of socioeconomic status, environmental factors, food accessibility, and healthcare availability on colorectal cancer incidence among young adults. We will integrate data from multiple sources, including the NIH PLCO dataset, U.S. Census Bureau, USDA Food Environment Atlas, SEER cancer registry, and CDC’s BRFSS, to conduct a comprehensive geospatial and socioeconomic analysis. Using Geographic Information Systems (GIS), we will map cancer incidence across various regions and identify high-risk areas with limited access to healthcare and other vital resources.

The project also aims to build a predictive model using machine learning techniques that will help identify young adults at greater risk of developing colorectal cancer, enabling targeted intervention and screening programs. For the predictive model, we will employ machine learning classification techniques such as logistic regression, random forest, and XGBoost, exploring the best approach to predicting cancer risk. By shedding light on these complex interrelationships, we hope to support public health efforts to improve early detection and prevention, ultimately reducing cancer incidence in young populations. This work could be a key step toward solving a growing public health crisis that affects thousands of lives each year.
Aims

1: Geospatial Analysis of Colorectal Cancer Incidence in Young Adults
Conduct a comprehensive geospatial analysis to identify geographic regions with high rates of young-onset colorectal cancer. This will utilize the SEER Database and CDC’s Behavioral Risk Factor Surveillance System (BRFSS) data to correlate cancer incidence with specific locations down to the county or zip code level.

2: Socioeconomic and Environmental Factors Linked to Colorectal Cancer Risk
Investigate the potential role of socioeconomic and environmental factors (e.g., income, education, housing, food deserts) in contributing to the rising rates of colorectal cancer among young adults. Data will be sourced from the U.S. Census Bureau, American Community Survey (ACS), USDA Food Environment Atlas, and Feeding America datasets.

3: Health Access and Preventative Care as Predictors of Colorectal Cancer Diagnosis
Assess the availability of healthcare and its impact on the timeliness of cancer diagnosis in young adults. Analyze data from HRSA, Kaiser Family Foundation (KFF), and the CDC to identify areas with limited access to medical care or screening facilities, correlating this with rates of advanced cancer diagnoses.

4: Develop a Predictive Risk Model for Targeted Outreach
Using machine learning classification techniques, develop a predictive model that identifies young adults at a higher risk of late-stage colorectal cancer. This model will leverage the data from previous aims to provide insights for targeted intervention strategies, such as public health campaigns and screening initiatives.

Collaborators

1) Michael Olheiser
SMU - Data Science
molheiser@mail.smu.edu
2) Shawn Deng
SMU -Data Science
shawnd@mail.smu.edu