A Comparison of Binary Classifiers Performance on Predicting Breast Cancer Diagnoses
The project specifically aims to train and evaluate the following binary classifiers: Naive Bayes, Decision Trees, K-Nearest Neighbors, and Support Vector Machine.
The subset of data I aim to analyze consists of the cancer diagnoses and the attributes in the following sections of the Breast Cancer Dataset:
BQ Demographics
BQ Smoking
BQ Family History
BQ Body Type
BQ NSAIDS
BQ Diseases
BQ Female Specific
I aim to evaluate the various classifiers and their performance in predicting the correct cancer diagnoses given these attributes. For classifier comparison, I will be evaluating various classifier feature values, such as multiple k-neighbor values for kNN and different kernel types for SVM, and I will also be utilizing k-fold cross-validation. Following initial classifier performance evaluation, I would then explore creating classifiers using smaller subsets of attributes to find attributes or attribute groups that are stronger or weaker indicators for predicting the diagnoses.
This work is specifically aiming to fulfill the undergraduate thesis requirement of the Fordham University College at Rose Hill Honors Program for my degree, Computer Science, B.S.
Dr. Gary Weiss, Fordham University, Thesis Mentor