Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Danping Liu
Degrees
Ph.D.
Institution
National Cancer Institute
Position Title
Investigator
Email
About this CDAS Project
Study
PLCO (Learn more about this study)
Project ID
PLCO-347
Initial CDAS Request Approval
Feb 26, 2018
Title
Methods for ovarian cancer risk prediction with longitudinal CA125 biomarker
Summary
This proposal is requesting longitudinally measured antigen CA125 biomarker to study the risk prediction of ovarian cancer. The longitudinal measured CA125 level provides information on the temporal behavior of this marker, which is important for constructing risk prediction models. The risk of ovarian cancer algorithm (ROCA), developed by Skates et al. (2001), has been tested primarily in clinical trials. ROCA essentially models the CA125 marker trajectories separately for the cases and controls, and derives the risk estimation using Bayes rule. However, women with elevated CA125 level often developed cancer sooner, and hence received shorter follow-up and fewer CA125 observations. This creates an informative dropout problem, for which ROCA did not account. In addition, the ROCA algorithm does not handle the prediction of the survival outcome, i.e., time to onset of ovarian cancer; it also makes a strong assumption that the lag between onset and diagnosis follows a known parametric distribution.
First, this project will investigate how ovarian cancer risk prediction would be affected by informative dropout, and propose statistical methods to account for the dropout. Incorporating the missing data mechanism in the risk prediction will hopefully improve ovarian cancer risk prediction over the ROCA algorithm. Second, the risk prediction should account for the timing and cancer staging at diagnosis. We will examine the prediction in survival model framework, where the staging at diagnosis will help us estimate the lag between cancer onset and diagnosis. While properly accounting for the informative dropout, as well as the timing and staging information, we hope to make formal inference on how the prediction accuracy of CA-125 changes over time. Third, there may exist subgroups of women, whose CA125 marker is more predictive of ovarian cancer than others. The subgroups are defined with respect to women’s individual characteristics. We will develop tree-based methods for identifying the subgroups. The prediction of subtypes of ovarian cancer based on histology will also be examined.
The PLCO study is a great resource for methodology development to address the challenges mentioned above. We will investigate novel shared random effect model (SREM) framework for ovarian cancer risk prediction. This method will be compared thoroughly with ROCA, in terms of prediction accuracy, risk calibration, and risk stratification. The SREM framework is a natural way to incorporate dependence of the longitudinal biomarker with dropout mechanism. It allows for formal testing of how well early observations of the biomarkers are predictive of the outcome. For subgroup identification, the computation burden is prohibitively heavy, especially with a large number of continuous covariates. We will develop fast algorithms to incorporate the tree-based approach into SREM.
Aims

1. Developing prediction models of ovarian cancer risk in presence of informative dropout, and comparing the performance with the existing ROCA algorithm.
We will first explore longitudinal regression methods to describe the CA-125 trajectories. The longitudinal biomarker will be modelled by a linear mixed model with splines to characterize the individual trajectory. The model fit will be compared with a random change-point model. The dropout mechanism will be modelled jointly with the longitudinal markers. The cancer outcome model shares the random effect terms in the longitudinal model, creating a dependence structure. Different structures of the shared random effects will be explored and compared. The risk prediction derived from SREM will be compared with the risk prediction using ROCA. The out-sample prediction accuracy will be evaluated using receiver operating characteristic (ROC) curve; the risk calibration plots and predictiveness curves will be compared as well.
2. Extending the prediction models to handle survival outcomes and the staging information.
In cancer risk prediction, interest is not only in predicting the cancer incidence, but also in predicting the timing and staging of cancer. Therefore, we will extend the SREM to incorporate the time information. A proportional hazard model with the shared random effect terms will be used to model the time to onset. Because the onset of cancer is unobserved and left-censored by the time of diagnosis, we will utilize the staging information at diagnosis to estimate the time lag between onset and diagnosis of ovarian cancer. The prediction accuracy of the extended SREM will be evaluated with time-dependent ROC curve.
3. Developing tree-based approaches to identify subgroups of predicting ovarian cancer incidence.
SREM will produce a risk prediction for ovarian cancer, and the ROC curve is used to evaluate the overall prediction accuracy. However, the prediction may not be uniformly well for every woman. We hope to detect whether there exist subgroups of individuals, defined by their covariates, whose prediction accuracy is better than others. Under SREM, the tree-based approach recursively partitions the data into two subsets based on covariates, fits SREM in each subset, and finds the partition in which the prediction accuracy differs most between the two subsets. Since fitting SREM is computationally difficult, we will investigate latent-class SREM to facilitate the implementation of tree-based method. Specifically, the subgroup will be represented by latent classes, and the estimated class probability is assumed to be associated with the covariates, and is analyzed using the regression trees. Therefore, in each step of the recursive partitions, the SREM does not need to be re-estimated.

Collaborators

Paul S. Albert (NCI/DCEG)
Jared Foster (NCI/DCTD)
Yongli Han (NCI/ECEG)
Christine Berg (NCI/DCP)