Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know: https://www.cancer.gov/coronavirus

Get the latest public health information from CDC: https://www.coronavirus.gov

Get the latest research information from NIH: https://www.nih.gov/coronavirus

Principal Investigator
Name
Matloob Khushi
Degrees
Ph.D.
Institution
The University of Sydney
Position Title
Programme Director of Master of Data Science
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-641
Initial CDAS Request Approval
Feb 24, 2020
Title
Development and Investigation of Class Imbalanced Data Methods for Lung Cancer Incidence Prediction Using Clinical and Radiomics Data
Summary
Early diagnosis of lung cancer provides more treatment options and reduced mortality. However, the unequal distribution of data between majority (negative) and minority (positive) classes causes general machine learning classifiers to be biased in favour of the majority class, leaving minority class examples to be often misclassified. Incorrect classification of a minority sample comes at a higher cost than false positives because it prevents correct diagnosis and timely treatment of the patient. In this study, we will investigate various imbalanced data techniques on lung cancer clinical and imaging data from the National Lung Screening Trial (NLST) for lung cancer incidence prediction. In particular, we will explore: (1) data-level methods to make the class distribution balanced, (2) algorithm-level methods that modify the learning or decision-making process such that more importance is placed on the minority class, and (3) hybrid methods that combine data-level and algorithm-level techniques to address imbalanced data.
Aims

- Develop and analyse lung cancer incidence prediction models by investigating various class imbalanced data methods and machine learning techniques to lung cancer data

Collaborators

Maranatha Consuelo Reyes, The University of Sydney