Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Alex Bui
Degrees
PhD
Institution
UCLA
Position Title
Professor
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-235
Initial CDAS Request Approval
Oct 11, 2016
Title
Machine learning-based approaches for risk calculation
Summary
The objective of this project is to explore the utility of different machine learning and statistical approaches on cancer cohorts in order to provide better classification/stratification of high-risk individuals. Specific techniques will be compared and contrasted, including: regression (linear, logistic); Bayesian models (naïve classifiers; belief networks); support vector machines (SVMs); and deep learning methods (convolution neural networks, recursive neural networks, etc.). As part of this process, feature selection algorithms will be explored (e.g., based on information gain) as well as standard dimensionality reduction methods (e.g., principal component analysis, PCA) to try and optimize classifier performance. Individual classifiers will be developed and assessed; an ensemble methods then developed and evaluated for comparison. Results of the classifiers built using PLCO data will be contrasted against other publically available datasets (e.g., NLST for the lung cancer cohort) as well as local datasets at UCLA, in order to assess external validity and understand issues around model transportability. By the end of this effort, the project will have formally evaluated the performance of different learned classifiers to calculate risk on the PLCO cohort, providing a means to understand what (derived) features are useful across different types of classifiers, and the degree to which such features can be reused in different populations.
Aims

Aim 1: To compare and contrast different statistical and machine learning approaches to build risk-based classifiers, optimizing individual and ensemble classifier performance. Temporal models will be considered. Efforts under Aim 1 will be performed on the PLCO dataset.

Aim 2: To compare the classifiers' performance, as developed in Aim 1, on external datasets, including NLST and local institutional data. The results of Aim 2 will inform development of model evaluation and transportability techniques.

Collaborators

William Hsu, PhD
Denise Aberle, MD