Skip to Main Content

An official website of the United States government

Principal Investigator
Name
Donna Spiegelman
Degrees
Sc.D.
Institution
Yale University
Position Title
Susan Dwight Bliss Professor of Biostatistics
Email
About this CDAS Project
Study
IDATA (Learn more about this study)
Project ID
IDATA-79
Initial CDAS Request Approval
Aug 22, 2024
Title
New Epidemiologic Methods for Reducing Measurement Error and Misclassification Bias in Cancer Epidemiology
Summary
Uncertainty in exposure and outcome measurements poses substantial challenges to the identification and quantification of the causes of cancer. For example, although difficult to measure well, physical activity patterns form the basis of many etiologic hypotheses concerning cancer risk. Cancer cases identified in electronic health records (EHR) and other administrative ‘big data’ sources, such as Medicare claims data, are also subject to misclassification. This exposure and outcome uncertainty leads to considerable bias in estimated health effects, masking our ability to detect true associations, which are likely underestimated if detected at all. It is the role of measurement error and misclassification correction methods to validly and efficiently estimate the relationship between exposures and cancer outcomes. To accomplish this, a validation study is required for estimating key features of the error process. Although much has been accomplished in this domain over the years, the current aims address unsolved problems of high scientific significance that would otherwise remain unanswered without this additional work. We will drill down into the multi-faceted themes that arise in cancer research, tackling several seminal new directions of critical importance for the translation of the results of population-based research to practice and policy. These methods will include estimation of the effects of within-individual change in lifestyle behaviors on cancer risk corrected for measurement error in the change variables, utilizing complex, currently under-accessed validation studies of diet and physical activity comprised of repeated paper and online questionnaire self-reports and repeated concentration and recovery biomarkers to obtain relative risk estimates unbiased by general measurement error structures which may include correlated and biased errors, and estimating effects of exposures, including medications, other clinical treatments, and health behaviors, on cancer incidence in EHR data. The new methods will be applied to studies of the impact of within-participant change in alcohol intake on breast cancer incidence in the American Cancer Society’s CPS-II cohort and in Harvard’s Nurses’ Health Study, and to a study disentangling the impacts of diabetes and diabetes medications on colorectal cancer risk in Yale New Haven’s Epic EHRs. Dissemination is a central feature of this research. User-friendly publicly available software will accompany all new methods to be developed. The new methods will be disseminated through short courses and lectures at national and international epidemiologic and statistical conferences, and through the development of a massive online open course (MOOC). We have assembled an outstanding team of experts in measurement error methods and statistical theory, along with an exceptional team of cancer epidemiologists with much prior collaborative experience with the methods team, to guide the developments and their applications to the scientific problems at hand. With the talented junior faculty and trainees to be recruited for this project, we will solve the challenging problems that have been identified.
Aims

1. Correction for bias in relative risk (RR) estimates due to error in the measurement of change in diet and physical activity, in relation to cancer incidence and mortality. Using repeated measures of alcohol intake, we will correct RR estimates for measurement error in analyses of the effect of women’s changes in alcohol intake over time on breast cancer incidence in CPS-II and NHS, using their dietary validation studies.
2. Correction for bias in RR estimates that leverage VS with multiple repeated methods for measuring physical activity and diet through self-report and biomarkers in under-utilized NCI’s IDATA and Harvard’s LVS and MLVS VS. De-attenuation factors will be produced using all the data from each VS in an integrated manner, and these will be made available to the scientific community. Complex correlation structures, heterogeneity of errors, which may, for example, depend upon covariates or have non-linear forms, and the impact of energy intake and energy balance will be carefully considered.
3. Correcting for bias in RRs due to misclassification of cancer incidence in EHR and other administrative ‘big data’ sources of information, such as Medicare claims data. We will assess the association of Type 2 diabetes prevalence and treatments in relation to colorectal cancer (CRC) incidence in Epic 1) using the ‘true’ outcome, here SEER-based CRC incidence; 2) using the misclassified Epic-based CRC incidence, and 3) by correcting for bias due to outcome misclassification using misclassified Epic-based CRC incidence, the SEER validation data, and our new methods.
4. Methods dissemination
(a) Software development: development of user-friendly, publicly available software for the methods produced in Aims 1-3, and for optimal design of main study/VS and main study/reliability studies. Software will be posted at GitHub and MPI Spiegelman and Wang’s websites, along with detailed manuals.
(b) Short courses development: short courses on methods developed in Aims 1–3 will be developed and presented at epidemiological and statistical conferences. Webinars and a MOOC will be delivered as well.

Collaborators

Molin Wang, Harvard University
Xin Zhou, Yale University
Lin Ge, Harvard University
Zhuoran Wei, Harvard University
Anna Porter, Yale University
Jingyu Cui, Yale University
Zexiang Li, Yale University