Skip to Main Content
Principal Investigator
Matloob Khushi
The University of Sydney
Position Title
Programme Director of Master of Data Science
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Feb 24, 2020
Development and Investigation of Class Imbalanced Data Methods for Lung Cancer Incidence Prediction Using Clinical and Radiomics Data
Early diagnosis of lung cancer provides more treatment options and reduced mortality. However, the unequal distribution of data between majority (negative) and minority (positive) classes causes general machine learning classifiers to be biased in favour of the majority class, leaving minority class examples to be often misclassified. Incorrect classification of a minority sample comes at a higher cost than false positives because it prevents correct diagnosis and timely treatment of the patient. In this study, we will investigate various imbalanced data techniques on lung cancer clinical and imaging data from the National Lung Screening Trial (NLST) for lung cancer incidence prediction. In particular, we will explore: (1) data-level methods to make the class distribution balanced, (2) algorithm-level methods that modify the learning or decision-making process such that more importance is placed on the minority class, and (3) hybrid methods that combine data-level and algorithm-level techniques to address imbalanced data.

- Develop and analyse lung cancer incidence prediction models by investigating various class imbalanced data methods and machine learning techniques to lung cancer data


Maranatha Consuelo Reyes, The University of Sydney