Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Principal Investigator
Name
Carol Hersh
Degrees
Ph.D
Institution
Great Neck South High School
Position Title
Teacher
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-651
Initial CDAS Request Approval
Jul 21, 2020
Title
Using Machine Learning for the Early Diagnosis of Pancreatic Cancer
Summary
In my project, I aim to develop a novel tool to identify pancreatic cancer at an early stage using machine learning. I will train the machine learning Decision Tree, Random Forest, Boosted Trees, Logistic Regression, and Support Vector Machine models on commonly available patient data found in the PLCO Pancreatic Cancer Dataset. After training, I aim to embed the machine learning model into a mobile app for an easy to use experience. That way, patients would enter their information into the app, and it will generate a prediction of whether or not they are likely to develop pancreatic cancer. If so, they can be admitted into further screening tests.
Aims

- Develop and train machine learning models (Decision Tree, Random Forest, Boosted Trees, Logistic Regression, and Support Vector Machine) on the PLCO dataset using commonly available patient data from the PLCO Pancreatic Cancer Dataset
- The BQ dataset will be used to add basic patient data to the model, including age, race, BMI, smoking history, family history of PC, and miscellaneous diseases
- The DHQ dataset will be used mainly to add alcohol consumption data to the model
- The SQX dataset will be used to add follow-up smoking data as well as physical activity data to the model

Collaborators

Carol Hersh - Great Neck South High School