Accounting for measurement error in high-dimensional metabolomics studies using the LASSO
However, as with many omics technologies, metabolomics is subject to measurement error [8]. Attempting to use the LASSO without accounting for measurement error can lead to biased coefficients (both attenuation and upward bias) and unreliable variable selection. In this project we aim to develop and apply a corrected LASSO which accounts for the measurement error in metabolomic data. The IDATA data set contains repeated measurements of physical activity and dietary questionnaires, as well as untargeted metabolite data with technical replicates and repeated measurements 6 months apart. We aim to use LASSO to identify urinary and serum metabolites associated with various dietary factors and physical activity levels. More specifically, with regard to dietary factors, we aim to identify metabolomic biomarkers of food items like coffee and food components like folate, iron, and macronutrients. As for physical activity, daily active and sedentary time will be considered. We will perform multiple analyses for the same factor, each time using different sources of available dietary (DHQ-II, ASA24, 4DFR) and physical activity (PAQ-AARP, ACT24, ActiGraph/ActivPAL) data with the subset of participants for which the data are available. The results of the statistical modelling will be compared between the different possible data sources.
We hope that this line of research will allow scientists who utilize dietary and physical activity data, as well as untargeted metabolomics data, to employ popular high-dimensional statistical methods that will still reliably account for measurement error.
References
1. Tibshirani R. doi:10.1111/j.2517-6161.1996.tb02080.x.
2. Playdon M.C., et al. doi:10.1016/j.ajcnut.2023.10.016.
3. Goutman S.A., et al. doi:10.1136/jnnp-2020323611.
4. Chen Y., et al. doi:10.1038/s41467-024-46043-y
5. Kelly R.S., et al. doi:10.1016/j.bbadis.2020.165936
6. Pathmasiri W., et al. doi:10.1038/s41598-024-64561-z.
7. Brennan L., et al. doi:10.1016/j.redox.2023.102808.
8. Karakach T.K., et al. doi:10.1016/j.aca.2009.01.048.
• Identify important metabolites associated with various dietary factors and physical activity levels using the LASSO.
• Use replicates for diet questionnaires and physical activity measurements, as well as technical and longitudinal replicates of metabolites, to mitigate the effects of measurement error on statistical modelling when applying the LASSO.
• Compare the results of statistical modelling with the corrected LASSO method between the different sources of dietary/physical activity data.
Prof. Sharon Xie (Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, PA, USA)
Dr. Erikka Loftfield (Division of Cancer Epidemiology and Genetics, NCI, MD, USA)