## Evaluating cancer mortality in the National Lung Screening Trial

The proposed research aims to develop efficient, unbiased estimation and testing procedures for analyzing cancer mortality data collected in NLST. As pointed out in Wang (1991), the efficiency of model estimation can be improved substantially when the truncation time distribution can be parameterized or fully specified. Under Aim 1, we will investigate a semiparametric truncation model where the distribution of the truncation variable is parametrized up to an unknown parameter. We will develop a maximum likelihood estimation procedure to estimate the truncation time distribution and the failure time distribution. Under Aim 2, we will consider a formal testing procedure to examine whether the incidence of the initial event is stationary over time. Under the stationarity assumption, the underlying truncation time is uniformly distributed and the survival time collected under cross-sectional sampling has a length-biased distribution. Our testing procedure is formulated by embedding the null truncation time distribution in a smooth alternative (Neyman 1937) and construct a semiparametric likelihood ratio statistic to test the model parameters.

Finally, under Aim 3, we will analyze cancer mortality in NLST by fitting an additive hazards competing risks model. The additive hazards model (Aalen, 1980, 1989; Lin & Ying, 1994), which focuses on modeling the difference in the risk, is an appealing alternative to the Cox model. This model is especially useful when the event rate is low or when the proportional hazards assumption fails to hold. The additive property also make it a natural choice for modeling the cause-specific hazards of competing risks (Shen & Cheng, 1999), because the covariate effect on the overall hazard, which is the sum of cause-specific hazards, is also additive. Existing estimation procedures for the additive hazards model, including the widely used methods proposed by Aalen (1989) and Lin and Ying (1994), are formulated based on unbiased estimating equations. An important caveat in the use of these estimating equation-based approaches is that they may yield negative hazard risk estimates, thus making the additive hazards model less desirable in view of risk prediction. To tackle this difficulty, we will present a novel composite binomial likelihood approach to estimate the additive hazards model. The proposed method yields proper survival probability estimates, hence is more appropriate in view of risk prediction.

Aim 1. Develop a semiparametric estimation procedure to estimate the failure time distribution and the truncation time distribution in NLST. We will first investigate a semiparametric truncation model where the distribution of truncation variable is parametrized up to an unknown parameter. We will show that, with proper reparameterization, the semiparametric maximum likelihood estimator for the failure time distribution and the truncation time distribution can be easily obtained by employing the EM algorithm developed under a generalized multiplicative censoring model.

Aim 2. Develop a hypothesis testing procedure to check the truncation time distribution in NLST. A formal test for checking the truncation time distribution will be constructed based on the semiparametric likelihood ratio test statistic. In particular, hypothesis testing for the stationarity assumption that the underlying truncation time is uniformly distributed will be performed by embedding the null uniform truncation time distribution in a smooth alternative (Neyman 1937).

Aim 3. Develop a new estimation procedure for the additive hazards model to study lung cancer mortality in NLST. We will propose a new estimation procedure for the additive hazards model. The proposed estimator is obtained by maximizing a composite binomial likelihood function. This likelihood-based approach can potentially be more efficient than the estimating-equation-based approaches proposed by Aalen (1989) and Lin and Ying (1994). More importantly, unlike the existing methods, the proposed method yields proper survival probability estimates, hence is more appropriate for risk prediction.

Professor Mei-Cheng Wang, Department of Biostatistics, Johns Hopkins University.

Dr. Jing Qin, Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health.