Development and Investigation of Class Imbalanced Data Methods for Lung Cancer Incidence Prediction Using Clinical and Radiomics Data

Principal Investigator

Name
Matloob Khushi

Degrees
Ph.D.

Institution
The University of Sydney

Position Title
Programme Director of Master of Data Science

Email
matloob.khushi@sydney.edu.au

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-641

Initial CDAS Request Approval
Feb 24, 2020

Title
Development and Investigation of Class Imbalanced Data Methods for Lung Cancer Incidence Prediction Using Clinical and Radiomics Data

Summary
Early diagnosis of lung cancer provides more treatment options and reduced mortality. However, the unequal distribution of data between majority (negative) and minority (positive) classes causes general machine learning classifiers to be biased in favour of the majority class, leaving minority class examples to be often misclassified. Incorrect classification of a minority sample comes at a higher cost than false positives because it prevents correct diagnosis and timely treatment of the patient. In this study, we will investigate various imbalanced data techniques on lung cancer clinical and imaging data from the National Lung Screening Trial (NLST) for lung cancer incidence prediction. In particular, we will explore: (1) data-level methods to make the class distribution balanced, (2) algorithm-level methods that modify the learning or decision-making process such that more importance is placed on the minority class, and (3) hybrid methods that combine data-level and algorithm-level techniques to address imbalanced data.

Aims

- Develop and analyse lung cancer incidence prediction models by investigating various class imbalanced data methods and machine learning techniques to lung cancer data

Collaborators

Maranatha Consuelo Reyes, The University of Sydney