Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Carol Hersh
Great Neck South High School
Position Title
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Jul 21, 2020
Using Machine Learning for the Early Diagnosis of Pancreatic Cancer
In my project, I aim to develop a novel tool to identify pancreatic cancer at an early stage using machine learning. I will train the machine learning Decision Tree, Random Forest, Boosted Trees, Logistic Regression, and Support Vector Machine models on commonly available patient data found in the PLCO Pancreatic Cancer Dataset. After training, I aim to embed the machine learning model into a mobile app for an easy to use experience. That way, patients would enter their information into the app, and it will generate a prediction of whether or not they are likely to develop pancreatic cancer. If so, they can be admitted into further screening tests.

- Develop and train machine learning models (Decision Tree, Random Forest, Boosted Trees, Logistic Regression, and Support Vector Machine) on the PLCO dataset using commonly available patient data from the PLCO Pancreatic Cancer Dataset
- The BQ dataset will be used to add basic patient data to the model, including age, race, BMI, smoking history, family history of PC, and miscellaneous diseases
- The DHQ dataset will be used mainly to add alcohol consumption data to the model
- The SQX dataset will be used to add follow-up smoking data as well as physical activity data to the model


Carol Hersh - Great Neck South High School