PLCO-336: A Comparison of Binary Classifiers Performance on Predicting Breast Cancer … - Approved Projects

Studies on CDAS

Additional Studies...

More Information

Principal Investigator

Name

Vanessa Gutierrez

Degrees

B.S.

Institution

Fordham University

Position Title

Student

vgutierrez5@fordham.edu

About this CDAS Project

Study

PLCO (Learn more about this study)

Project ID

PLCO-336

Initial CDAS Request Approval

Jan 4, 2018

Title

A Comparison of Binary Classifiers Performance on Predicting Breast Cancer Diagnoses

Summary

My project for my undergraduate honors thesis in computer science aims to train and evaluate the performance of four binary classifiers on a subset of the Breast Cancer Dataset. Specifically, each classifier will predict breast cancer diagnoses given a subset of patient attributes. Following initial classifier evaluation, specific subgroups of attributes will be investigated for their strength as training attributes.

Aims

The project specifically aims to train and evaluate the following binary classifiers: Naive Bayes, Decision Trees, K-Nearest Neighbors, and Support Vector Machine.
The subset of data I aim to analyze consists of the cancer diagnoses and the attributes in the following sections of the Breast Cancer Dataset:
BQ Demographics
BQ Smoking
BQ Family History
BQ Body Type
BQ NSAIDS
BQ Diseases
BQ Female Specific

I aim to evaluate the various classifiers and their performance in predicting the correct cancer diagnoses given these attributes. For classifier comparison, I will be evaluating various classifier feature values, such as multiple k-neighbor values for kNN and different kernel types for SVM, and I will also be utilizing k-fold cross-validation. Following initial classifier performance evaluation, I would then explore creating classifiers using smaller subsets of attributes to find attributes or attribute groups that are stronger or weaker indicators for predicting the diagnoses.

This work is specifically aiming to fulfill the undergraduate thesis requirement of the Fordham University College at Rose Hill Honors Program for my degree, Computer Science, B.S.

Collaborators

Dr. Gary Weiss, Fordham University, Thesis Mentor