Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Bence Lukacsy
Degrees
High School Diploma
Institution
Barrington High School
Position Title
Student
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-1112
Initial CDAS Request Approval
Dec 5, 2022
Title
Machine Learning Model to Predict Various Types of Cancer
Summary
Utilizing Python machine learning library scikit-learn, gradient boosting framework XGBoost, and Tensorflow neural networks, I am creating various prediction models using algorithms like decision trees (unboosted and boosted), random forests, support vector machines, naive Bayes, regression, and more. This project is for the eventual International Science and Engineering Fair 2023, and will, in theory, predict with high accuracy either yes—diagnosed with a certain cancer, or no—not diagnosed with a certain cancer. The models will be flexible such that they are not limited to just one type of cancer; lung, pancreatic, ovarian, and other cancers will be used if the data for them is obtained. The models will take in basic health information as input, such as height, BMI, diet, smoking history, etc, and output either yes or no as mentioned above, or a percentage of the likelihood of contracting the target cancer. The end goal is a publically available website where users can enter basic information and get the influence necessary to get officially tested/screened, and if diagnosed, treated, for the cancer in question.
Aims

- Acquire data
- Build models
- Test accuracy of models
- Arrive at a final model and cancer OR a multimodel and multicancer product (this is limited by lack of data)
- Build website allowing users to enter basic health data (non-invasive) and see what the model predicts given those inputs
- Construct research paper, then publish and present

Collaborators

Independant