Does excluding participants on the basis of implausible energy intake introduce selection bias in the association between a diet-related exposure and a health outcome?
Principal Investigator
Name
Kyle Busse
Degrees
Ph.D, M.P.H.
Institution
University of Pennsylvania Perelman School of Medicine
Position Title
Postdoctoral Researcher
Email
About this CDAS Project
Study
IDATA
(Learn more about this study)
Project ID
IDATA-89
Initial CDAS Request Approval
Feb 14, 2025
Title
Does excluding participants on the basis of implausible energy intake introduce selection bias in the association between a diet-related exposure and a health outcome?
Summary
Studies of dietary intake and its relationship to health outcomes are plagued by random and systematic error in self-reported diet data (Subar et al., 2015). Differential misreporting of total energy intake with respect to anthropometric and sociodemographic characteristics is of particular concern, given the well-known relationship between energy intake and the risk of obesity and cardiometabolic diseases (Gomez-Bruton et al., 2019; Mendoza et al., 2007; Rouhani et al., 2016; Zheng et al., 2014). Higher body mass index is associated with more frequent underreporting of energy intake, while sociodemographic characteristics, including race, ethnicity, age, and education, have also been found to be associated with misreporting (Freedman et al., 2014; Livingstone & Black, 2003; Neuhouser et al., 2008; Nowicki et al., 2011). This systematic error in the measurement of energy intake raises concerns of bias in the estimation of relationships between diet-related exposures and health outcomes (Subar et al., 2015).
In recognition of this, nutritional epidemiologists frequently take steps to identify and exclude inaccurate reporters. Comparing self-reported energy intake to objectively measured energy expenditure from validated biomarkers, such as doubly labeled water, is considered the gold standard for identifying implausible values (Park et al., 2014). However, collection of biomarkers for measuring energy expenditure are costly for large samples (Banna et al., 2017; Livingstone & Black, 2003; Samuel-Hodge et al., 2004). Alternative, less expensive approaches include imposing predetermined cutpoints for daily kilocalories, above and below which self-reported energy intake is considered implausible, and comparing self-reported energy intake to estimated energy requirements based on age, body size, and activity level (Huang et al., 2005; Livingstone & Black, 2003; Mendez et al., 2011; Nielsen & Adair, 2007; Nowicki et al., 2011; Rhee et al., 2015; Willett, 2012).
Previous examinations of the effectiveness of these approaches with respect to their effects on reducing bias have failed to yield any consensus on which method is best, or even if they are reliable (Ejima et al., 2019; Mendez, 2015; Yamamoto et al., 2023). Nonetheless, their application in nutritional epidemiological analyses persists, at times resulting in the exclusion of many participants from the analytic sample (Nielsen & Adair, 2007). By selecting on participants with supposedly plausible levels of energy intake in the subsequent analysis, nutritional epidemiologists may unintentionally introduce bias. To our knowledge, this unintended consequence has received little or no attention. In this study, we will explain how this practice may introduce selection bias in the estimates of the association between a diet-related exposure and a health outcome. Then, using data from the Interactive Diet and Activity Tracking in American Association of Retired Persons (IDATA) study, we will use three common approaches to identify participants with implausible levels of energy intake (cutpoints, energy requirements, energy expenditure) and assess the implications of excluding those with implausible energy intake on the degree of bias in the association between a health outcome and the percentage of daily calories from fruits and vegetables, using inverse probability weighting methods to account for selecting on participants with “plausible” values for total energy.
In recognition of this, nutritional epidemiologists frequently take steps to identify and exclude inaccurate reporters. Comparing self-reported energy intake to objectively measured energy expenditure from validated biomarkers, such as doubly labeled water, is considered the gold standard for identifying implausible values (Park et al., 2014). However, collection of biomarkers for measuring energy expenditure are costly for large samples (Banna et al., 2017; Livingstone & Black, 2003; Samuel-Hodge et al., 2004). Alternative, less expensive approaches include imposing predetermined cutpoints for daily kilocalories, above and below which self-reported energy intake is considered implausible, and comparing self-reported energy intake to estimated energy requirements based on age, body size, and activity level (Huang et al., 2005; Livingstone & Black, 2003; Mendez et al., 2011; Nielsen & Adair, 2007; Nowicki et al., 2011; Rhee et al., 2015; Willett, 2012).
Previous examinations of the effectiveness of these approaches with respect to their effects on reducing bias have failed to yield any consensus on which method is best, or even if they are reliable (Ejima et al., 2019; Mendez, 2015; Yamamoto et al., 2023). Nonetheless, their application in nutritional epidemiological analyses persists, at times resulting in the exclusion of many participants from the analytic sample (Nielsen & Adair, 2007). By selecting on participants with supposedly plausible levels of energy intake in the subsequent analysis, nutritional epidemiologists may unintentionally introduce bias. To our knowledge, this unintended consequence has received little or no attention. In this study, we will explain how this practice may introduce selection bias in the estimates of the association between a diet-related exposure and a health outcome. Then, using data from the Interactive Diet and Activity Tracking in American Association of Retired Persons (IDATA) study, we will use three common approaches to identify participants with implausible levels of energy intake (cutpoints, energy requirements, energy expenditure) and assess the implications of excluding those with implausible energy intake on the degree of bias in the association between a health outcome and the percentage of daily calories from fruits and vegetables, using inverse probability weighting methods to account for selecting on participants with “plausible” values for total energy.
Aims
Aim: To examine whether exclusion of participants with implausible levels of self-reported energy intake introduces selection bias in the association between a diet-related exposure and a health outcome.
Collaborators
Sunni Mumford, University of Pennsylvania Perelman School of Medicine
Stefanie Hinkle, University of Pennsylvania Perelman School of Medicine