Skip to Main Content
Principal Investigator
Liangyuan Hu
Icahn School of Medicine
Position Title
Assistant Professor
About this CDAS Project
NLST (Learn more about this study)
Project ID
Initial CDAS Request Approval
Jan 27, 2020
Targeting low-dose computer tomography screening to improve lung-cancer survival: a machine learning-based post-hoc analysis of heterogeneous treatment effects in the National Lung Screening Trial
Lung cancer is the major cause of cancer-related mortality in the United States. Because it is a rapidly spreading disease, early detection is important to improving patient survival. The National Lung Screening Trial (NLST) demonstrated that treatment to low-dose CT provides lung-cancer mortality benefit relative to chest radiography on average. Follow-up research showed significant variation in improved lung-cancer survival by gender and lung cancer risk. The cost-effectiveness of low-dose CT screening was also found to be strongly dependent on how patients are targeted, with greater cost-effectiveness for women and higher risk patients, but also for older patients and current smokers.

Given the heterogeneous nature of lung cancer, identifying patient subgroups who better respond to screening is essential to provide personalized care. However, conventional subgroup analyses may systematically fail to identify subpopulations that are optimally targeted by an intervention because they are limited by pre-specified univariate covariate testing and are prone to multiple testing concerns and estimation bias. Recent statistical advances in machine learning have developed strategies to predict heterogeneous treatment effects (HTEs) over a rich set of variables and functional forms, generally by building ensembles of non-parametric models from pre-specified covariates and then isolating the combinations of covariates that maximally explain the variation in individual outcomes across study arms.

In this project, we investigate heterogeneity in lung cancer mortality in the NLST by using a Bayesian machine learning method adapted for survival analysis, called Accelerated Failure time model with Bayesian Additive Regression Trees (AFT-BART). We deploy the method and develop data-driven machine learning techniques to identify patient subgroups that may have HTEs. Our methods will be based on agnostic data-driven techniques and circumvent the issue of multiple testing.

Aim 1: We will leverage a Bayesian machine learning method for survival outcome, AFT-BART, to estimate the individual 'treatment effect' comparing low-dose CT with chest radiography.

Aim 2: Based on the estimated individual treatment effects, we will develop a machine learning technique to estimate heterogeneous treatment effects and identify subgroups that may have enhanced effect or may be harmed from a screening method.

Aim 3: We will develop computational tools and make them publicly available to encourage fellow researchers to build off our work.


Bian Liu , Icahn School of Medicine
Jiayi Ji, Icahn School of Medicine
Madhu Mazumdar, Icahn School of Medicine