Skip to Main Content

An official website of the United States government

Government Funding Lapse

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit  cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Principal Investigator
Name
Matloob Khushi
Degrees
Ph.D.
Institution
The University of Sydney
Position Title
Programme Director of Master of Data Science
Email
About this CDAS Project
Study
NLST (Learn more about this study)
Project ID
NLST-641
Initial CDAS Request Approval
Feb 24, 2020
Title
Development and Investigation of Class Imbalanced Data Methods for Lung Cancer Incidence Prediction Using Clinical and Radiomics Data
Summary
Early diagnosis of lung cancer provides more treatment options and reduced mortality. However, the unequal distribution of data between majority (negative) and minority (positive) classes causes general machine learning classifiers to be biased in favour of the majority class, leaving minority class examples to be often misclassified. Incorrect classification of a minority sample comes at a higher cost than false positives because it prevents correct diagnosis and timely treatment of the patient. In this study, we will investigate various imbalanced data techniques on lung cancer clinical and imaging data from the National Lung Screening Trial (NLST) for lung cancer incidence prediction. In particular, we will explore: (1) data-level methods to make the class distribution balanced, (2) algorithm-level methods that modify the learning or decision-making process such that more importance is placed on the minority class, and (3) hybrid methods that combine data-level and algorithm-level techniques to address imbalanced data.
Aims

- Develop and analyse lung cancer incidence prediction models by investigating various class imbalanced data methods and machine learning techniques to lung cancer data

Collaborators

Maranatha Consuelo Reyes, The University of Sydney