Skip to Main Content
Principal Investigator
Ziding Feng
Dept. of Biostatistics, The University of Texas MD Anderson Cancer Center
Position Title
About this CDAS Project
PLCO (Learn more about this study)
Project ID
Initial CDAS Request Approval
Jun 30, 2017
Inference in ROC curves for biomrkers measured in two phase nested case control studies
The two-phase case–control sampling design is common in biomarker evaluation studies. In the first phase, a large sample size that is representative of the target population is available and is referenced mainly for clinical characteristics of the participants. In the second phase, biomarker measurements are taken for only a subsample of phase one participants due to limited resources. Since the subsampling is usually done based on some matching criteria, inherent bias is present. For example, when dealing with a lung cancer study, matching cases and controls for smoking status may be crucial. This biased sampling needs to be taken into account when clinical questions that refer to the target population are of interest. While many researchers focus on the area under the ROC curve to assess the discriminatory ability of a biomarker, such a measure does not provide an appealing clinical interpretation. Clinicians are most often interested in the performance of a biomarker at high levels of sensitivity or specificity to avoid under-diagnosis or over-diagnosis, respectively. This is driven by the seriousness of a false negative or the invasiveness of the work-up required to identify a false positive. We are currently developing statistical methodology to obtain estimates of the ROC(t), at a given t, with its corresponding confidence intervals, while also accounting for the aforementioned biased sampling scheme.

Our aim is to develop a statistical methodology that will allow to estimate and construct confidence intervals for the sensitivity at a given specificity (and vice versa) when biomarker measurements are taken within a biased sample (commonly due to matching) of a larger cohort (i.e. two phase nested case control study). We require the data for 11 statistically significant inflammation biomarkers for lung cancer presented in the paper: "Circulating Inflammation Markers and Prospective Risk for Lung Cancer" by Meredith S. Shiels et al. (2013, JNCI, Vol 105, Issue 24, pages: 1871-1879). We want to illustrate our statistical approaches using this data set as the authors have considered matching. Based on these data and the clinical characteristics of the patients that have been used by Meredith S. Shiels et al. we can project an ROC curve estimator that will refer to the performance of these markers on the general population. This requires the clinical information of the full PLCO data that is already available to us.


Professor Ziding Feng, Dept. of Biostatistics, The University of Texas MD Anderson Cancer Center.

Related Publications