Accounting for measurement error in studies with repeated measures using LASSO
Principal Investigator
Name
Sharon Xie
Degrees
M.S., Ph.D
Institution
University of Pennsylvania
Position Title
Professor
Email
About this CDAS Project
Study
IDATA
(Learn more about this study)
Project ID
IDATA-78
Initial CDAS Request Approval
Aug 22, 2024
Title
Accounting for measurement error in studies with repeated measures using LASSO
Summary
Recent studies have begun to explore the application of LASSO regression [1] for variable selection with dietary exposures and health outcomes [2-4]. Such studies could provide additional insight into the components of diet which are important to various aspects of health. However, when implementing the LASSO, they often fail to account for the measurement errors present in questionnaire instruments. This is likely due to the dearth of statistical research on LASSO regression in the presence of error-prone variables. Unfortunately, ignoring measurement errors when using LASSO can result in unreliable variable selection and biased coefficient estimates.
One scientific question of interest, for example, is the effect of dietary exposures and physical activity on health markers like BMI, resting heart rate, and blood pressure with repeated measures over time [5,6]. Variable selection may be used to discover important nutrients associated with these health markers. To date there has been no extension of the LASSO method in the repeated measures context which would account for measurement error.
In this project we aim to develop this correction for the use of LASSO in mixed-effects models for repeated measures over time. The IDATA data set contains repeated measurements of physical activity and diet using multiple instruments (we will incorporate data from DHQ-II, ASA24, 4DFR; PAQ-AARP, ACT24, ActiGraph/ActivPAL), as well as measurements of various health markers like BMI over the course of one year. We will develop and implement a LASSO-based statistical method to discover aspects of diet and physical activity which are important for health, accounting for the measurement error present in this type of data (which will vary in nature depending on the measurement instrument used).
We hope that this line of research will allow scientists who utilize dietary and physical activity data to employ a broader set of statistical methods that will still reliably account for measurement error.
References
1. Tibshirani R. doi:10.1111/j.2517-6161.1996.tb02080.x.
2. Qu Y., et al. doi:10.3389/fped.2022.870529.
3. Zhang F., et al. doi:10.1186/s12874-018-0585-8.
4. McEligot A.J., et al. doi:10.3390/nu12092652.
5. Meyerkort C.E., et al. doi:10.1017/S2040174411000717.
6. Konieczna J., et al. doi:10.1186/s12966-019-0893-3.
One scientific question of interest, for example, is the effect of dietary exposures and physical activity on health markers like BMI, resting heart rate, and blood pressure with repeated measures over time [5,6]. Variable selection may be used to discover important nutrients associated with these health markers. To date there has been no extension of the LASSO method in the repeated measures context which would account for measurement error.
In this project we aim to develop this correction for the use of LASSO in mixed-effects models for repeated measures over time. The IDATA data set contains repeated measurements of physical activity and diet using multiple instruments (we will incorporate data from DHQ-II, ASA24, 4DFR; PAQ-AARP, ACT24, ActiGraph/ActivPAL), as well as measurements of various health markers like BMI over the course of one year. We will develop and implement a LASSO-based statistical method to discover aspects of diet and physical activity which are important for health, accounting for the measurement error present in this type of data (which will vary in nature depending on the measurement instrument used).
We hope that this line of research will allow scientists who utilize dietary and physical activity data to employ a broader set of statistical methods that will still reliably account for measurement error.
References
1. Tibshirani R. doi:10.1111/j.2517-6161.1996.tb02080.x.
2. Qu Y., et al. doi:10.3389/fped.2022.870529.
3. Zhang F., et al. doi:10.1186/s12874-018-0585-8.
4. McEligot A.J., et al. doi:10.3390/nu12092652.
5. Meyerkort C.E., et al. doi:10.1017/S2040174411000717.
6. Konieczna J., et al. doi:10.1186/s12966-019-0893-3.
Aims
• Identify important nutrients and physical activity parameters associated with health outcomes (e.g. BMI, resting heart rate, blood pressure).
• Use replicates for diet questionnaires and physical activity measurements to mitigate the effects of measurement error on statistical modelling.
• Develop an appropriate statistical method to address this question, namely mixed-effects LASSO with measurement error correction.
Collaborators
Prof. Sharon Xie (Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, PA, USA)