Skip to Main Content

COVID-19 is an emerging, rapidly evolving situation.

What people with cancer should know:

Get the latest public health information from CDC:

Get the latest research information from NIH:

Principal Investigator
Jon Steingrimsson
Brown University
Position Title
Assistant Professor of Biostatistics
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Aug 5, 2020
Subgroup Identification based on Lung Cancer Risk Prediction Model
Machine learning models are often considered applicable to the data population they were trained on. However, it is natural to consider whether there is some sub-population, that the model is extremely effective at predicting, driving up the overall prediction performance. Or equivalently, we wish to know if there exists a certain subgroup that is poorly-represented by the model. For example, one might be interested in whether a model built primarily using data of young people is representative of the older population. The goal of this project is to discover and analyze potential subgroups of the overall population for risk prediction models built using lung cancer screening data.

1. Develop a data-driven algorithm that identifies subgroups with differential prediction accuracy.
2. Apply the algorithm to discover potential sub-populations of the NLST data that are not well-represented by the risk prediction models built using previous lung cancer screening data. Perform a detailed analysis of the subgroups identified (if there's any).


Constantine Gatsonis, Brown University
Ruotao Zhang, Brown University