basal metabolic rate (bmr) estimation using probabilistic …1321159/... · 2019-06-07 · 2014)....
TRANSCRIPT
Basal Metabolic Rate (BMR) estimation usingProbabilistic Graphical Models
By Zara Jackson
Department of Statistics
Uppsala University
Supervisor: Harry Khamis
2019
Abstract
Obesity is a growing problem globally. Currently 2.3 billion adults are overweight, and this number is rising. The
most common method for weight loss is calorie counting, in which to lose weight a person should be in a calorie
deficit. Basal Metabolic Rate accounts for the majority of calories a person burns in a day and it is therefore a major
contributor to accurate calorie counting. This paper uses a Dynamic Bayesian Network to estimate Basal Metabolic
Rate (BMR) for a sample of 219 individuals from all Body Mass Index (BMI) categories. The data was collected
through the Lifesum app. A comparison of the estimated BMR values was made with the commonly used Harris
Benedict equation, finding that food journaling is a sufficient method to estimate BMR. Next day weight prediction
was also computed based on the estimated BMR. The results stated that the Harris Benedict equation produced
more accurate predictions than the metabolic model proposed, therefore more work is necessary to find a model that
accurately estimates BMR.
Keywords— Basal Metabolic Rate, Resting Metabolic Rate, Dynamic Bayesian Networks, Temporal Models, Food Tracking,
Calories, Obesity, Pymc3, Probabilistic Programming
Contents
1 Introduction 1
2 Background 3
2.1 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Dynamic Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Method 5
3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Self Reporting Bias of Calories Consumed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.3 Metabolic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.1 Model Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.3 Prediction Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Results 10
4.1 Correlation of Harris Benedict and Energy Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Self Reporting Bias Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Model Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4.1 Prediction Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Discussion 16
1 Introduction
Currently there are around 2.3 billion adults overweight globally. The World Health Organisation Organisation (WHO) (2016),
state that within the last four decades obesity rates have almost tripled. In 2016 30% of adults worldwide were overweight or obese.
Obesity is in particular a problem in the USA where in 2014 70% of the population were either overweight or obese, (NIDDK
2014). Obesity is linked to many health problems, for example: heart disease, stroke, diabetes and some cancers, (Kelly 2018).
Therefore it is essential that obesity rates are reduced. In order to lose weight many people turn to calorie counting, in an attempt to
consume less calories than they burn, and thus achieve weight loss. Calorie counting is therefore only effective if we can accurately
estimate the number of calories a person burns each day. Energy expenditure is the total number of calories burned while at rest plus
calories burned during exercise. Trexler et al. (2014) state that up to 70% of our daily energy expenditure is from Basal Metabolic
Rate (BMR). BMR referrers to the number of calories our bodies burn while at rest. It is the number of calories required to keep
our bodies functioning, through for example, breathing, blood circulation, and temperature regulation.
Lifesum is a mobile health app used for meal and exercise tracking aiming to help users achieve their weight goals. Lifesum
focuses on helping people make better food choices, by helping the user choose a diet that fits their lifestyle and goals. It provides
tips and feedback based on the chosen diet plan and the user’s goal. Lifesum provides a platform to easily track calories, macronu-
trients, and water consumed each day. The app syncs to popular fitness devices to simplify exercise tracking. A user can track
weight and body measurements within the app allowing them to visualise their weight journey. Lifesum recommends how many
calories each user should eat each day based on the user’s goal type and BMR, recommending a safe calorie deficit for weight loss
or calorie surplus for weight gain. The amount of exercise tracked is also considered when recommending how many calories a
user should consume, increasing the calorie target as users track exercise throughout the day.
Currently Lifesum use the Harris Benedict Equation in order to estimate a users BMR. The Harris Benedict BMR estimate is
based on an individual’s gender, age, height and weight, and is calculated as follows (Harris and Benedict 1918):
Men: BMR = 66.5 + (13.75× weight(kg)) + (5.003× height(cm))− (6.755× age(years))
Women: BMR = 655.1 + (9.563× weight(kg)) + (1.850× height(cm))− (4.676× age(years))
Many previous studies have focused on the accuracy of the Harris Benedict Equation. Owen et al. (1986) found that the Harris
Benedict Equation overestimated BMR by up to 24% in women. Daly et al. (1985) also found that the Harris Benedict Equation
overestimated BMR, however by 10-15%. Other research in this area suggests that the Harris Benedict Equation is particularly
inaccurate for those who have been obese in the past. Astrup et al. (1999) found that formerly obese subjects have BMRs 3-5%
lower than those estimated by the Harris Benedict equation. Douglas et al. (2007) also found that the Harris Benedict Equation
tends to be less accurate in women who have a history of obesity, finding their actual BMR to be less than the estimated value.
Variation in BMR from one individual to another is caused by many different factors. An individual’s body weight can be di-
vided into a fat component (Fat Mass) and a remainder (Fat Free Mass). A study of adults in Scotland by Johnstone et al. (2005)
found that 63% of the variation between individuals BMR was caused by Fat Free Mass, while 6% came from Fat Mass and 2% was
caused by age. They found 26% of the variation to be unexplained. Other studies have also found Fat Mass and Fat Free Mass to
account for the majority of the variation in BMR (Sabounchi et al. 2013). In addition to this they also found age to be a contributing
factor in BMR variation. They concluded that age has a bigger impact on children than on adults and also on males compared to
1
females.
While Fat Free Mass is the largest contributor to BMR, often obese individuals have a lower BMR than the Harris Benedict
Equation estimates (Sabounchi et al. 2013). This is due to a larger proportion of fat mass, which burns less calories at rest than fat
free mass causing discrepancies in the estimation, (Douglas et al. 2007). As a result of these factors the BMR currently estimated
within the Lifesum app may not be an accurate estimation of each individual user’s BMR. As BMR is the largest factor influencing
a user’s calorie goal, and users adapt their eating to the calorie goal, a poor estimate could lead users to eat in ways that make their
goal difficult or even impossible to achieve.
Change in weight is caused by the difference between total energy expenditure and the number of calories consumed through
food. Therefore in addition to BMR, the number of calories consumed needs to be accurately tracked in order for a user to lose
weight. This is often not the case due to various difficulties in self reporting. When individuals self-report calories they tend to
under report by an average of 20% (Livingstone et al. 1990). Other studies concluded that calories consumed were underreported
by both obese and nonobease individuals, however the proportion undertracked is much larger amongst obese individuals (Bandini
et al. 1990, King et al. 2016). Obese subjects underreport their calorie intake by an average of 47% while non obese subjects
underreported by an average of 19% (Lichtman et al. 1992). In addition to this research, Livingstone et al. (1990) also concluded
that 12 of 20 obese subjects under reported foods that are high in calories compared to 3 of 22 lean subjects, suggesting that obese
individuals are also more likely to be selective in the types of food they track.
Another reason for under tracking could be linked to inaccuracies in food labelling. In a study carried out on American restaurant
and supermarket products, Urban et al. (2010) found that on average the calories in food servings were 18% more than the calorie
information provided by restaurants. In addition to this they found that the labels of packaged foods understated the number of
calories by an average of 8%. The U.S. Food and Drug Administration considers a food label to be compliant with their guidelines
if the calories stated are within 20% (in either direction) of the actual amount of calories contained in the product. As a result of
this even when a user tracks what is stated on food packages it is often inaccurate.
Probabilistic Graphical Models (PGMs) have been increasingly popular in the field of health, Lucas et al. (2004) provide a summary
of studies where PGM’s have been of particular interest. This may be because the graphical representation is straightforward to
interpret. In addition to this they can be used to overcome limitations that result from uncertainties in health data, (Sato et al. 2015).
Another advantage is that knowledge about the human metabolism can be leveraged to refine the model structure.
The aim of this thesis is to use probabilistic graphical modelling to estimate BMR based on an individuals personal weight journey
and tracking data in order to gain a better understanding of an individual’s BMR, and thereby to improve their calorie target in-
creasing the likelihood of their success. To build this model we use tracking data from the Lifesum App. This paper aims to answer
four main research questions:
1. How well does the Harris Benedict equation capture an individuals energy balance?
2. Can we estimate the reporting bias of calories by Lifesum users?
3. Is food journaling sufficient to estimate BMR?
4. How much variation in BMR can we measure from tracking data?
2
The remainder of this paper is structured as follows. The next section, provides some background to Probabilistic Graphical Models,
particularly Dynamic Bayesian Modelling which will be used throughout this paper. Section 3 is focused on the Method used, it
contains a description of the data and the model. Section 4 is a report of the results. Finally section 5 provides some concluding
remarks.
2 Background
2.1 Probabilistic Graphical Models
Probabilistic graphical models provide a framework for capturing complex dependencies between random variables that involve
uncertainty. By combining graph theory and probability theory graphical models are used to build large-scale multivariate statistical
models (Wainwright and Jordan (2007)). A Probabilistic Graphical Model is made up of a combination of nodes and edges. Each
node is associated with a random variable and the edges represent the conditional relationships between the random variables.
Probabilistic Graphical Models perform Inference and Learning by incorporating prior knowledge within the model.
2.2 Bayesian Networks
A Bayesian Network (BN) is one of the most common forms of Probabilistic Graphical Models. A Bayesian Network is made up
of two components, (Larrañaga et al. (2012)). First, a Directed Acyclic Graph (DAG) which represents the structure of the system
and the conditional dependencies between variables. A DAG consists of nodes and edges which have the same meaning as above.
The second component of a Bayesian Network is a set of parameters, which based on the structure of the DAG, state the conditional
probability distribution for each variable given its parents. Figure 1 is an example of a simple Bayesian Network. This graphical
representation is a tractable way of modelling the joint distribution P(A,B,C). For example, the edge (A,C) connects the random
variable A to the random variable C, and means that P(C|A) is a factor in the joint probability distribution.
Figure 1: A Directed Acyclic Graph (DAG) of a Simple Bayesian Network
2.3 Dynamic Bayesian Networks
A Dynamic Bayesian Network (DBN) is an extension of a Bayesian Network, where the system’s state is changing over time. A
basic assumption of a DBN is that time can be discretized into a set of equally spaced time slices. This assumption simplifies the
problem from representing distributions over a continuum of random variables to representing distributions over countably many
3
random variables, sampled at discrete intervals. A dynamic Bayesian Network allows connections within time slices known as
intra-slice connections as well as connections between consecutive time slices known as inter-time slice connections. Inter-time
slice connections therefore imply conditional probabilities between time slices. In addition to this, variables within a DBN must
satisfy the Markov Assumption, that is, the state of the system at time t +1 depends on its immediate past, i.e. its state at time t,
(Mihajlovic and Petkovic (2001)). It is therefore independent of all timepoints prior to t (denoted X(0:(t−1))). Thus, for all t ≥ 0:
(X(t+1) ⊥ X(0:(t−1)) | X(t))
Where X = {x1, ..., .xn} is the set of random variables.
The states of a dynamic model are not always directly observable. A variable that is not observed is referred to as a latent vari-
able. Latent variables may influence other variables that can be directly measured. Each state within a dynamic model at one time
slice may depend on one or more states at the previous time instance and/or on some states in the same time slice. A graphical
representation of a Dynamic Bayesian Network can be seen below in Figure 2.
Figure 2: An example of a Dynamic Bayesian Network
Now, let X = {x1, ..., .xn} be the set of latent variables, while Y = {y1, ..., yn} is the set of observable variables. T is the time
boundary. Combining this notation with the Markov assumption we can define the distribution of the variables sampled over time
t = 0, ...T as:
P (X,Y ) =
T∏t=1
Pr(X(t+1)|X(t))
T∏t=1
Pr(Y(t+1)|X(t+1))Pr(X(0))
In order to completely specify a DBN we need to define three sets of parameters (Mihajlovic and Petkovic 2001):
– State transition conditional probability distributions (CPD) Pr(X(t+1)|X(t)), which represent the time dependencies be-
tween each of the states
– Observation conditional probability distributions Pr(Y(t+1)|X(t+1)), which represent the dependencies between each of the
nodes at time slice t+1
– Initial state distribution’s Pr(X(0)), which specify the initial probability distributions at the start of the process.
4
3 Method
3.1 Data
This thesis uses a proprietary data set collected using the Lifesum App, from January 2017 to February 2019. Upon signup a user
enters information, including gender, height and their weight goal. Users can track meals and snacks on a daily basis. They can
also track exercise by entering the exercise type and duration manually, or by syncing the Lifesum app with other devices used to
record exercise (E.g. Apple Watch or Fitbit). Users can also track their weight and body measurements over time either manually
or by syncing with smart scales.
As tracking is voluntary there are very large quantities of missing data. The missing data is not completely at random. For example,
a user who does not log their food consumption on a given day may skip tracking as they consumed more than the recommended
amount of calories that day. In order to limit confounding effects from this missingness this thesis focuses on a sample of users.
The sample contains users who fulfil the following criteria for a period of at least 14 consecutive days:
1. Users who have recorded at least one food item for breakfast lunch and dinner.
2. Users who have recorded their weight daily through the use of Smart Scales that are synced with the Lifesum app.
3. Users who track exercise through a Smart Watch which is also synced with the Lifesum app. Tracking through synced
devices reduces self reporting inaccuracies.
4. Users over the age of 20 are selected to ensure that growth has ended. Teenagers are likely to still be growing and it is
therefore possible that their height may not be up to date.
Table 1 contains user demographics. The sample consists of 219 individual users from all over the world, 130 of which are male
and 89 are female. The users range from 21 to 80 years old with an average age of 43. Each user has tracked a minimum of 14
consecutive days and in total the data set contains 7887 tracked days. Body Mass Index (BMI) was calculated for each user on the
first day of their tracking streak. Users from all BMI categories are included. BMI ranges from 16 which is considered underweight
to 45 which is considered obese. The average BMI is 26 which is considered to be overweight, however most users are of normal
(healthy) weight. Table 2 contains the number of users in each BMI category. 60 users (27.4%) are obese.
Table 1: User Demographics
Total (n=219) Male (n=130) Female(n=89)
Min Max Mean Min Max Mean Min Max Mean
Age 21 80 43 21 80 44 24 70 40
Height (cm) 152.4 203.2 175.6 163.0 203.2 179.3 152.’ 185.0 166.9
Weight (kg) 45.7 165.6 80.3 53.1 165.4 82.3 45.7 138.8 75.46
BMI 16.7 44.6 26.0 16.7 44.6 25.6 17.4 43.9 27.0
5
Table 2: Number fo users in each BMI category
BMI Category
Underweight Normal (Healthy) Weight Overweight Obese
Number of Users 3 84 72 60
In order to link a user’s daily activities to changes in weight, time of weighin is an important factor to consider. If users weigh them-
selves in the morning, the energy balance from the previous day will impact the current day’s weight. If users weigh themselves in
the evening the weight observed on the current day is impacted by the energy balance of that day. Figure 3 shows a histogram of
weighin times. The majority (69.4%) of the sample of users recorded their weight before 12pm. Therefore we make the simplifying
assumption that a day’s energy balance is reflected in the next day’s weighin.
Figure 3: Histogram showing the time of day Lifesum users track daily weight (n=219)
3.2 Self Reporting Bias of Calories Consumed
As discussed previously, change in weight occurs as a result of the balance between calories consumed through food (calories in)
and calories burned (calories out). It is therefore important that the number of calories consumed are accurately tracked in order for
a user to successfully achieve their weight goals. Previous research has suggested that individuals tend to under track food by up to
47% (Lichtman et al. 1992).
Energy balance can be represented by the following equation:
κ× Cals In− Cals Out− ω ×∆ weight = 0 (1)
Where
- Cals out denotes the total number of calories burned by BMR and exercise
6
- κ is the proportion of calories under/over tracked
- ω kcal/kg is the number of kilocalories in 1kg of body weight.
- ∆ weight denotes change in weight from time t to t+1
Therefore in order to find the proportion of calories undertracked by the sample of users we aim to find κ and ω such that:
loss(κ, ω) =∑i
|κ× Cals ini − Cals outi − ω ×∆ weighti| (2)
is minimised.
3.3 Metabolic Model
3.3.1 Model Representation
Figure 4: Dynamic Bayesian Model for BMR over two time slices
Graphical representation of the metabolic model is seen in Figure 4. This model is a Dynamic Bayesian Model in which each time
slice represents one day, and the number of time slices is individual to each user based on the number of consecutive days tracked.
Each user tracked a minimum of one 14 day streak, and some users may have tracked mulitple steaks. Observed variables are
7
represented by a node with a solid outline, while latent variables are represented by a node with a dashed outline. The model is
described mathematically by defining the conditional and initial distributions below.
The Conditional Probability Distribution for the model, this includes the state transition and observations, can be written as:
Pr(BMRt+1,EBt+1,∆wt+1,Wt+1,OWt+1) = Pr(BMRt+1|BMRt)Pr(EBt+1|Ft+1,Ext+1,BMRt+1)
× Pr(∆wt+1|EBt+1)Pr(Wt+1|∆Wt,Wt)Pr(OWt+1|Wt+1)(3)
The initial state distribution of the model can be written as:
Pr(BMR0,EB0,∆W0,W0,OW0) = Pr(BMR0)Pr(EB0|F0,Ex0,BMR0)Pr(∆W0|EB0)Pr(W0)Pr(OW0|W0) (4)
The model consists of only continuous variables and therefore the conditional probability distributions can be described as Gaussian
process’ (Shachter and Kenley 1989). Thus, each individual variable has a normal prior distribution. More specifically the prior
distributions of each variable are defined as:
Latent BMR
BMRt ∼ N (BMRt−1, α) (5)
α ∼ Exp(1/10) (6)
BMR at each time point is normally distributed with a mean of the estimated value of BMR at the previous time slice, and a standard
deviation of α. α acts as the step size. In other words α is the amount BMR can change from one day to the next. As the value of
α is uncertain a distribution is placed around it. Because α is the variance of BMR it is therefore strictly positive, leading to the
choice of an exponential prior.
The initial value of BMR is also normally distributed as:
BMR0 ∼ N (β, 0.25 ∗ β) (7)
Where β is the BMR value on day 0 estimated by the Harris Benedict Equation. The standard deviation is 0.25 ∗ β as previous
research found individuals to vary from the Harris Benedict equation by up to 24% (Owen et al. 1986), the chosen value allows for
slightly more variation.
Energy Balance
Energy Balance is calculated at each time point within the model, using the observed values for calories consumed and calories
burned though exercise, as well as the individualised value for undertracking bias.
Calculated Energy Balancet = Tracking Bias× Calories In Foodt − Calories Out BMRt − Calories burned by Exerciset (8)
As there is uncertainty around the value of energy balance we place a normal distribution around it.
Energy Balancet ∼ N (Calculated Energy Balancet, γ) (9)
8
γ ∼ Exp(1/ε) (10)
Where ε is an individualised value of the standard deviation of calculated energy balance for each user. In this case when calculating
energy balance the Harris Benedict estimation of BMR was used. The value is the same over all the days of tracking. Again since
there is uncertainty in ε we place a distribution around it.
∆ weight (Change in weight)
∆ Weightt = EBt/ω (11)
ω represents the number of kilocalories in 1kg of fat as in equation (1). In weight loss the primary objective is to lose body fat,
which contains 7700kcal/kg (Hall 2008). As a simplifying assumption, we assume all weight lost or gained is body fat, hence we
set ω to 7700 kcal/kg.
Latent weight
Latent weight at each time point is normally distributed with a mean of the estimated value of Latent weight at the previous time
point plus Energy Balance at the current time point, and a standard deviation of ∆.
Latent Weightt ∼ N (Latent weightt−1 + Energy Balancet,∆) (12)
∆ corresponds to how much we believe latent weight can change in a day, we suspect latent weight to change at around 0.15kg.
However, there is uncertainty in the estimated value of ∆ so it also follows an exponential distribution.
∆ ∼ Exp(1/0.15) (13)
Observed weight:
Observed Weightt ∼ N (Latent Weightt, ζ) (14)
Where ζ is an individualised value of the variance in observed weight change. As this value is observed there is no distribution
placed around it.
3.3.2 Inference
Inference in a DBN is defined as finding the values of the latent states given the observed states at each time slice (Mihajlovic and
Petkovic 2001). Inference can be represented mathematically as
Pr(XT−10 |Y T−1
0 ) (15)
Where Y T−10 denotes the set of T consecutive observed variables, Y T−1
0 = {y0, y1, ..., yT−1} and XT−10 represents the set of
latent variables XT−10 = {x0, x1, ..., xT−1}
Inference is calculated by No-U-Turn (NUTs) Sampling. NUTS is a Markov Chain Monte Carlo (MCMC) sampling method
that is very similar to Hamiltonian Monte Carlo (HMC). A disadvantage of HMC is it requires a step size to be chosen, which is
often difficult to decide upon. NUTs performs inference without the need to specify a step size. It is especially useful in performing
inference on models that contain many continuous variables. Using the log-posterior-density, NUTS uses information about the
9
regions of higher probability, it therefore converges faster than other MCMC sampling methods. To summarise, NUTS makes it
possible to efficiently perform Bayesian posterior inference on a large class of complex, high-dimensional models with minimal
human intervention. A description of the NUTs algorithm can be found in Hoffman and Gelman (2011) (Algorithm 6).
3.3.3 Prediction Analysis
We aim to predict the weight of each individual at the next time slice, based off the information obtained about the latent variables in
the current time slice. Prediction is therefore regarded as an inference problem, and can be described by the calculation (Mihajlovic
and Petkovic 2001):
Pr(xt+1|Y t0 ) =
∑xtPr(xt+1|xt)αt(xt)∑
xtαt(xt)
(16)
In the same way
Pr(yt+1|Y t0 ) =
∑xt+1
αt+1(xt+1)∑xtαt(xt)
(17)
Where αt(xt) is the forward probability distribution describing the joint probability observations at time t. Therefore
αt(xt) = Pr(Y t0 , xt) (18)
From which it follows
αt+1(xt+1) = Pr(yt+1|xt+1)∑
xt
Pr(xt+1|xt)αt(xt) (19)
4 Results
4.1 Correlation of Harris Benedict and Energy Balance
Measured weight can vary a lot from one day to the next, Denning et al. (1990) found daily weight variation to be up to 4kg in
women. Figure 5 below shows that for Lifesum users measured weight can change by up to 3kg in either direction. The majority
of weight fluctuation each day is due to fluid balance, but some can be a result of changes in muscle or fat mass. Latent weight
is a person’s theoretical underlying weight, at some neutral level of hydration and digestive activity. It therefore removes noise in
the measured value that results from fluid variation. Changes in latent weight are directly caused by energy balance. In order to
investigate how well the Harris Benedict Equation captures Energy Balance we look at the correlation between energy balance and
observed weight change.
Where Energy balance is defined as:
Energy Balance = Total Calories consumed - Calories Out Exercise - Calories Out BMR
Figure 5 below shows the correlation between Energy balance at time t and change in weight (from time t to t+1). In this case
Energy Balance is calculated using the estimate of BMR from the Harris Benedict Equation. A very weak correlation (r=0.056)
exists, which is surprisingly low given the reported effectiveness of calorie counting (Hartmann-Boyce et al. 2014), and that calorie
targets rely on this equation. While some of this lack of correlation can be explained by changes in water balance which we cannot
measure since water contains zero calories but accounts for most of daily weight variability, we still expect to see some correlation.
Another possible factor may be due to users undertracking the amount of calories they consume, in addition to an inaccurate esti-
mation of BMR by the Harris Benedict equation.
10
Figure 5: Relationship between Energy Balance at time t-1 and Weight Change between t-1 and t, for each day
and user based on Lifesum tracking data and the Harris Benedict BMR calculation.
4.2 Self Reporting Bias Results
Previous research suggests that a self reporting bias exists when food journaling. The method to estimate self reporting bias is
shown in equation (2). Hall (2008) states that ω is around 7700 kcal/kg. Therefore fixing ω to 7700 in equation (2) above we
find that on average users under track the amount of calories they consume on a daily basis by about 7.9%. This value is lower
than that found in prior research which typically focused on tracking in obese subjects, but in the sample of users analysed, only
27.4% are obese. Therefore it is plausible that undertracking rates are lower than previous studies. To try to capture potential
undertracking effects, we estimated undertracking for each user at an individual level, as mentioned in Dynamic Bayesian Model
previously introduced.
4.3 Model Inference
The metabolic model estimates a value for each variable within the model each day by sampling from the posterior distributions.
Figure 6 compares the results from the Metabolic Model with the observed data for 3 users chosen at random from different BMI
categories. Information on the user’s characteristics can be found in Table 3 below.
Figure 6.A shows a comparison between BMR estimated by the model and by the Harris Benedict equation for an obese user.
This user has a BMR calculated by the Harris Benedict equation which changes each day within the range of 2075-2095 calories
11
over his 22 day streak. There is large variation in BMR between consecutive days when BMR is estimated using the Harris Bene-
dict equation. The Metabolic Model estimates this users BMR to be in general, higher than the Harris Benedict estimation. It also
evolves at at a much steadier rate. Figure 6.B shows that the metabolic model estimates latent weight to also evolve at a steadier
rate than observed values of weight. The estimation of latent weight is also within the range of the changes in measured weight as
expected.
The overweight user in Figure 6.C has a BMR estimated by the Harris Benedict equation in the range of 1235-1255 calories
throughout her tracking streak. Again the metabolic model estimates BMR to be in general higher than that of the Harris Benedict
equation, as seen in Figure 6.C. The model estimates BMR to evolve at a steadier rate then that of the Harris Benedict equation.
Figure 6.D shows that the metabolic model estimates latent weight to also evolve at a steadier rate than observed values of weight.
This user however has less fluctuation in observed weight between days than the obese user.
A user who is considered to have a normal (healthy) weight is shown in Figure 6.E. This user has a BMR estimated from the
Harris Benedict equation in the range of 1110-1137 calories. The metabolic model estimates BMR to be lower than the Harris
Benedict estimation for the first 15 days of the streak, and then higher for the remaining 13 days. The model estimates BMR to
change over time at a steadier rate compared to the BMR estimated by the Harris Benedict equation which fluctuates more from
one day to the next.
For all 3 users the Harris Benedict estimation of BMR evolves over time following the same pattern as observed weight change.
This is to be expected as measured weight is the only variable changing in the Harris Benedict Equation as time evolves (over short
time periods). This is however not the case in the results produced from the metabolic model, in which case the estimation of BMR
responds to changing levels of calories consumed through food and calories burned through exercise each day.
As the model estimates BMR to be in a similar range to the Harris Benedict estimation it can be concluded that food journal-
ing is sufficient to estimate BMR. That being said the accuracy of the BMR estimation produced by the model remains unclear,
without expensive measurement of BMR for these subjects.
Table 3: User CharacteristicsBMI Category Age Gender Height (cm) Initial Weight (kg) Initial BMI
Obese 39 Male 177.8 116.1 36.7
Overweight 55 Female 163.0 70.4 26.5
Normal (Healthy) Weight 51 Female 168.0 55.7 19.7
12
Obese User
Normal (Healthy) User
Overweight User
A B
C
E F
D
Figure 6: Variable estimation from metabolic model compared to observed values Plots on the left (A, C, E) show
the Harris Benedict estimate of BMR over time compared to the estimated value from NUTs sampling. The plots on
the right (B, D, F) compare measured weight over time with the value of latent weight estimated from NUTs sampling.
13
4.4 Prediction Results
As a final method of model validation, we use the Dynamic Bayesian Model to predict next-day observed weight, and compare the
results to two baselines. Table 4 shows the Root Mean Squared Error (RMSE) for the next day weight prediction, for all users over
all tracking days, as well as users grouped by BMI category. Prediction using the Dynamic Bayesian Model was computed using
the NUT’s sampling method. As our first baseline, we use a naive approach, LOCF (Last Observation Carried Forward), estimating
that a users weight at time t + 1 will be the same as at time t. As a second baseline, we compare prediction based on the Harris
Benedict estimate and naive energy balance calculations.
Table 4: RMSE’s of next day predicted weight comparing different prediction methods.
User type NUTsHarris Benedict
EstimateLOCF
Obese 1.287 1.009 1.035
Overweight 0.572 0.551 0.565
Normal (Healthy) Weight 0.566 0.552 0.557
Underweight 0.571 0.564 0.566
All Users 0.832 0.708 0.723
Overall the Harris Benedict Equation predicts next day weight most accurately for all users compared to the other methods of
prediction (RMSE: 0.708). It also predicts next day weight the most accurately for each individual BMI category. The metabolic
model proposed predicts next day weight less accurately compared to the Harris Benedict equation and the LOCF method for all
users. This is also the case for each separate BMI category. RMSE is greatest for obese individuals across all methods. Therefore
the Harris Benedict Equation and the metabolic model proposed in this paper are least accurate for obese subjects compared with
the other BMI categories.
4.4.1 Prediction Error Analysis
The largest errors in prediction occurred in users that had multiple tracking streaks. Figure 7 shows a comparison of the actual
next day weight with the prediction produced by the Metabolic Model as well as the prediction produced by the Harris Benedict
Equation, for the second tracking streak of the two users that had the largest prediction errors. From this plot it can be concluded
that the Metabolic Model’s prediction takes longer to adjust after there is a gap between streaks than the Harris Benedict Prediction.
The Harris Benedict Equation adjusts to the new weight value faster as the estimation is based on observed weight directly.
Table 5 shows the RMSE’s for each prediction method, when the data contains only the first streak for each user. The Metabolic
Model now predicts next day weight more accurately than LOCF for all users, however the difference in accuracy is very small
(RMSE: 0.539 vs 0.541). It is also more accurate than LOCF for each separate BMI category, except for Normal weight users,
where LOCF is slightly more accurate. The Metabolic Model also predicts next day weight more accurately than the Harris Bene-
dict equation for users that are underweight, however due to the low sample size in this group this result should be interpreted
with some caution. While removing multiple streaks for each user in the data improves prediction accuracy there is still room for
improvement in the model. The Harris Benedict equation continues to predict next day weight more accurately than the proposed
model for all users, and for each BMI category other than underweight users.
14
Figure 7: Actual and Predicted Values of Next Day Weight. A shows the comparison for the user with the largest
prediction for the second tracking streak only. B shows the comparison for the user with the second largest prediction
errors for the second tracking streak only.
Table 5: RMSE’s of next day predicted weight comparing different prediction methods (One tracking streak only).
User type NUTsHarris Benedict
EstimateLOCF
Obese 0.572 0.549 0.574
Overweight 0.515 0.510 0.522
Normal (Healthy) Weight 0.538 0.527 0.534
Underweight 0.454 0.510 0.523
All Users 0.539 0.528 0.541
15
5 Discussion
This paper aims to estimate an individuals BMR and how it changes over time by using food and exercise tracking data. We use
data obtained form the Lifesum App for 219 users ranging in characteristics. By investigating the correlation between daily total
energy expenditure and change in weight from day t to t+1, it was concluded that BMR estimated by the Harris Benedict equation
does not adequately capture an individual’s energy balance. The low correlations could be a result of inaccuracies in tracking data
or inaccuracies in the estimation of BMR. Therefore a tracking bias was calculated for the users in the study, finding that on average
users undertrack by 7.9% of calories consumed each day.
A Dynamic Bayesian Model was used to estimate BMR. From this model it was concluded that food journaling is sufficient to
estimate BMR, however more research in this field is required. Improvements could be made to the model in order to increase the
accuracy of the estimated BMR, for example changing the prior distributions to better fit the data. A method of model validation
was to predict next day weight from the estimate of BMR produced by the proposed model. Prediction analysis concluded that the
Harris Benedict equation remains a more accurate estimate of BMR, both overall, and for each individual BMI category in compar-
ison to BMR estimated by the model. While self reporting bias was calculated for each user within this model it was constant over
all days of a users tracking streak. Calculating this bias day by day may lead to more accurate results. In addition to this the results
could be improved by accounting for gaps between tracking streaks within the model. Unfortunately time restrictions did not allow
for these alterations.
Previous studies have indicated that the Harris Benedict Equation is particularly inaccurate for obese individuals. This study also
confirms this, finding the Harris Benedict to perform worse when predicting next day weight of obese individuals in comparison
to those in other BMI categories. The Metabolic Model proposed in this paper also performed worse on obese subjects compared
with others. Therefore further research is required to find a more accurate method to estimate BMR for the obese population.
16
Acknowledgements
I would like to thank Lifesum for providing me with the data used in this study. A special thanks is extended to the data team
at Lifesum, particularly Lars Yencken and Pauline Vercruysse, for their support throughout this project. In addition to this they
contributed to many insightful discussions that have made this paper possible. Finally, I would like to thank Harry Khamis, Uppsala
University, for his guidance during this study.
17
References
Astrup, Arne, Peter C. Gøtzsche, Karen van de Werken, Claudia Ranneries, Søren Toubro, Anne Raben, and Benjamin
Buemann (1999). “Meta-analysis of resting metabolic rate in formerly obese subjects”. en. In: The American
Journal of Clinical Nutrition 69.6, pp. 1117–1122. ISSN: 0002-9165. DOI: 10.1093/ajcn/69.6.1117. URL:
http://academic.oup.com/ajcn/article/69/6/1117/4714917 (visited on 04/26/2019).
Bandini, L. G., D. A. Schoeller, H. N. Cyr, and W. H. Dietz (1990). “Validity of reported energy intake in obese and
nonobese adolescents”. eng. In: The American Journal of Clinical Nutrition 52.3, pp. 421–425. ISSN: 0002-9165.
DOI: 10.1093/ajcn/52.3.421.
Daly, J. M., S. B. Heymsfield, C. A. Head, L. P. Harvey, D. W. Nixon, H. Katzeff, and G. D. Grossman (1985).
“Human energy requirements: overestimation by widely used prediction equation”. eng. In: The American Journal
of Clinical Nutrition 42.6, pp. 1170–1174. ISSN: 0002-9165. DOI: 10.1093/ajcn/42.6.1170.
Denning, D. W., M. G. Dunnigan, J. Tillman, J. A. Davis, and C. A. Forrest (1990). “The relationship between ’normal’
fluid retention in women and idiopathic oedema.” en. In: Postgraduate Medical Journal 66.775, pp. 363–366. ISSN:
0032-5473, 1469-0756. DOI: 10.1136/pgmj.66.775.363. URL: https://pmj.bmj.com/content/
66/775/363 (visited on 05/23/2019).
Douglas, Crystal C., Jeannine C. Lawrence, Nikki C. Bush, Robert A. Oster, Barbara A. Gower, and Betty E. Darnell
(2007). “Ability of the Harris Benedict formula to predict energy requirements differs with weight history and
ethnicity”. In: Nutrition research (New York, N.Y.) 27.4, pp. 194–199. ISSN: 0271-5317. DOI: 10.1016/j.
nutres.2007.01.016. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2598419/
(visited on 04/25/2019).
Hall, K. D. (2008). “What is the required energy deficit per unit weight loss?” eng. In: International Journal of Obesity
(2005) 32.3, pp. 573–576. ISSN: 1476-5497. DOI: 10.1038/sj.ijo.0803720.
Harris, J. Arthur and Francis G. Benedict (1918). “A Biometric Study of Human Basal Metabolism”. In: Proceedings
of the National Academy of Sciences of the United States of America 4.12, pp. 370–373. ISSN: 0027-8424. URL:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1091498/ (visited on 05/13/2019).
Hartmann-Boyce, J., D. J. Johns, S. A. Jebb, P. Aveyard, and Behavioural Weight Management Review Group (2014).
“Effect of behavioural techniques and delivery mode on effectiveness of weight management: systematic review,
meta-analysis and meta-regression”. eng. In: Obesity Reviews: An Official Journal of the International Association
for the Study of Obesity 15.7, pp. 598–609. ISSN: 1467-789X. DOI: 10.1111/obr.12165.
Hoffman, Matthew D. and Andrew Gelman (2011). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in
Hamiltonian Monte Carlo”. In: arXiv:1111.4246 [cs, stat]. URL: http://arxiv.org/abs/1111.4246
(visited on 03/04/2019).
Johnstone, Alexandra M., Sandra D. Murison, Jackie S. Duncan, Kellie A. Rance, and John R. Speakman (2005).
“Factors influencing variation in basal metabolic rate include fat-free mass, fat mass, age, and circulating thyrox-
ine but not sex, circulating leptin, or triiodothyronine”. en. In: The American Journal of Clinical Nutrition 82.5,
18
pp. 941–948. ISSN: 0002-9165. DOI: 10.1093/ajcn/82.5.941. URL: https://academic.oup.com/
ajcn/article/82/5/941/4607670 (visited on 02/11/2019).
Kelly, Evelyn B. (2018). Obesity, 2nd Edition. Santa Barbara, UNITED STATES: ABC-CLIO, LLC. ISBN: 978-1-
4408-5882-6. URL: http://ebookcentral.proquest.com/lib/uu/detail.action?docID=
5341454 (visited on 05/13/2019).
King, Bruce M., Amanda N. Ivester, Priscilla D. Burgess, Kimberly M. Shappell, Katherine L. Coleman, Victoria M.
Cespedes, Harriet S. Pruitt, Grace K. Burden, and Eric S. Bour (2016). Adults with Obesity Underreport High-
calorie Foods in the Home. en. Text. DOI: info:doi/10.14485/HBPR.3.5.4. URL: https://
www.ingentaconnect.com/contentone/psp/hbpr/2016/00000003/00000005/art00004#
(visited on 02/26/2019).
Larrañaga, Pedro, Hossein Karshenas, Concha Bielza, and Roberto Santana (2012). “A review on probabilistic graph-
ical models in evolutionary computation”. In: Journal of Heuristics 18.5, pp. 795–819. ISSN: 1572-9397. DOI:
10.1007/s10732-012-9208-4. URL: https://doi.org/10.1007/s10732-012-9208-4.
Lichtman, S. W., K. Pisarska, E. R. Berman, M. Pestone, H. Dowling, E. Offenbacher, H. Weisel, S. Heshka, D. E.
Matthews, and S. B. Heymsfield (1992). “Discrepancy between self-reported and actual caloric intake and exercise
in obese subjects”. eng. In: The New England Journal of Medicine 327.27, pp. 1893–1898. ISSN: 0028-4793. DOI:
10.1056/NEJM199212313272701.
Livingstone, M. B., A. M. Prentice, J. J. Strain, W. A. Coward, A. E. Black, M. E. Barker, P. G. McKenna, and R. G.
Whitehead (1990). “Accuracy of weighed dietary records in studies of diet and health.” en. In: BMJ 300.6726,
pp. 708–712. ISSN: 0959-8138, 1468-5833. DOI: 10.1136/bmj.300.6726.708. URL: https://www.
bmj.com/content/300/6726/708 (visited on 02/27/2019).
Lucas, Peter J.F. et al. (2004). “Bayesian networks in biomedicine and health-care”. en. In: Artificial Intelligence in
Medicine 30.3, pp. 201–214. ISSN: 09333657. DOI: 10.1016/j.artmed.2003.11.001. URL: https:
//linkinghub.elsevier.com/retrieve/pii/S0933365703001313 (visited on 05/13/2019).
Mihajlovic, V. and M. Petkovic (2001). “Dynamic Bayesian Networks: A State of the Art”. Undefined. In: URL:
https://research.utwente.nl/en/publications/dynamic-bayesian-networks-a-
state-of-the-art (visited on 04/24/2019).
NIDDK (2014). Overweight & Obesity Statistics | NIDDK. en-US. URL: https://www.niddk.nih.gov/
health-information/health-statistics/overweight-obesity (visited on 05/07/2019).
Organisation (WHO), World health (2016). Obesity and overweight. en. URL: https://www.who.int/news-
room/fact-sheets/detail/obesity-and-overweight (visited on 05/07/2019).
Owen, O. E., E. Kavle, R. S. Owen, M. Polansky, S. Caprio, M. A. Mozzoli, Z. V. Kendrick, M. C. Bushman, and
G. Boden (1986). “A reappraisal of caloric requirements in healthy women”. eng. In: The American Journal of
Clinical Nutrition 44.1, pp. 1–19. ISSN: 0002-9165. DOI: 10.1093/ajcn/44.1.1.
Sabounchi, Nasim S. et al. (2013). “Best Fitting Prediction Equations for Basal Metabolic Rate: Informing Obesity
Interventions in Diverse Populations”. In: International journal of obesity (2005) 37.10, pp. 1364–1370. ISSN:
19
0307-0565. DOI: 10.1038/ijo.2012.218. URL: https://www.ncbi.nlm.nih.gov/pmc/
articles/PMC4278349/ (visited on 02/01/2019).
Sato, Renato Cesar et al. (2015). “Probabilistic graphic models applied to identification of diseases”. In: DOI: 10.
1590/S1679-45082015RB3121.
Shachter, Ross D. and C. Robert Kenley (1989). “Gaussian Influence Diagrams”. In: Management Science 35.5,
pp. 527–550. ISSN: 0025-1909. DOI: 10.1287/mnsc.35.5.527. URL: https://pubsonline.
informs.org/doi/abs/10.1287/mnsc.35.5.527 (visited on 05/20/2019).
Trexler, Eric T. et al. (2014). “Metabolic adaptation to weight loss: implications for the athlete”. In: Journal of the
International Society of Sports Nutrition 11.1, p. 7. ISSN: 1550-2783. DOI: 10.1186/1550-2783-11-7.
URL: https://doi.org/10.1186/1550-2783-11-7.
Urban, Lorien E., Gerard E. Dallal, Lisa M. Robinson, Lynne M. Ausman, Edward Saltzman, and Susan B. Roberts
(2010). “The Accuracy of Stated Energy Contents of Reduced-Energy, Commercially Prepared Foods”. In: Journal
of the American Dietetic Association 110.1, pp. 116–123. ISSN: 0002-8223. DOI: 10.1016/j.jada.2009.
10.003. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2838242/ (visited on
04/25/2019).
Wainwright, Martin J. and Michael I. Jordan (2007). “Graphical Models, Exponential Families, and Variational In-
ference”. en. In: Foundations and Trends R© in Machine Learning 1.1–2, pp. 1–305. ISSN: 1935-8237, 1935-8245.
DOI: 10.1561/2200000001. URL: http://www.nowpublishers.com/article/Details/MAL-
001 (visited on 04/12/2019).
20