Subgroup Identification based on Lung Cancer Risk Prediction Model

Principal Investigator

Name
Jon Steingrimsson

Degrees
Ph.D.

Institution
Brown University

Position Title
Assistant Professor of Biostatistics

Email
jon_steingrimsson@brown.edu

About this CDAS Project

Study
NLST (Learn more about this study)

Project ID
NLST-701

Initial CDAS Request Approval
Aug 5, 2020

Title
Subgroup Identification based on Lung Cancer Risk Prediction Model

Summary
Machine learning models are often considered applicable to the data population they were trained on. However, it is natural to consider whether there is some sub-population, that the model is extremely effective at predicting, driving up the overall prediction performance. Or equivalently, we wish to know if there exists a certain subgroup that is poorly-represented by the model. For example, one might be interested in whether a model built primarily using data of young people is representative of the older population. The goal of this project is to discover and analyze potential subgroups of the overall population for risk prediction models built using lung cancer screening data.

Aims

1. Develop a data-driven algorithm that identifies subgroups with differential prediction accuracy.
2. Apply the algorithm to discover potential sub-populations of the NLST data that are not well-represented by the risk prediction models built using previous lung cancer screening data. Perform a detailed analysis of the subgroups identified (if there's any).

Collaborators

Constantine Gatsonis, Brown University
Ruotao Zhang, Brown University