basal metabolic rate (bmr) estimation using probabilistic …1321159/... · 2019-06-07 · 2014)....

Basal Metabolic Rate (BMR) estimation usingProbabilistic Graphical Models

By Zara Jackson

Department of Statistics

Uppsala University

Supervisor: Harry Khamis

2019

Abstract

Obesity is a growing problem globally. Currently 2.3 billion adults are overweight, and this number is rising. The

most common method for weight loss is calorie counting, in which to lose weight a person should be in a calorie

deficit. Basal Metabolic Rate accounts for the majority of calories a person burns in a day and it is therefore a major

contributor to accurate calorie counting. This paper uses a Dynamic Bayesian Network to estimate Basal Metabolic

Rate (BMR) for a sample of 219 individuals from all Body Mass Index (BMI) categories. The data was collected

through the Lifesum app. A comparison of the estimated BMR values was made with the commonly used Harris

Benedict equation, finding that food journaling is a sufficient method to estimate BMR. Next day weight prediction

was also computed based on the estimated BMR. The results stated that the Harris Benedict equation produced

more accurate predictions than the metabolic model proposed, therefore more work is necessary to find a model that

accurately estimates BMR.

Keywords— Basal Metabolic Rate, Resting Metabolic Rate, Dynamic Bayesian Networks, Temporal Models, Food Tracking,

Calories, Obesity, Pymc3, Probabilistic Programming

Contents

1 Introduction 1

2 Background 3

2.1 Probabilistic Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Dynamic Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Method 5

3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Self Reporting Bias of Calories Consumed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3 Metabolic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3.1 Model Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.3.3 Prediction Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Results 10

4.1 Correlation of Harris Benedict and Energy Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Self Reporting Bias Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.3 Model Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.4 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.4.1 Prediction Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Discussion 16

1 Introduction

Currently there are around 2.3 billion adults overweight globally. The World Health Organisation Organisation (WHO) (2016),

state that within the last four decades obesity rates have almost tripled. In 2016 30% of adults worldwide were overweight or obese.

Obesity is in particular a problem in the USA where in 2014 70% of the population were either overweight or obese, (NIDDK

2014). Obesity is linked to many health problems, for example: heart disease, stroke, diabetes and some cancers, (Kelly 2018).

Therefore it is essential that obesity rates are reduced. In order to lose weight many people turn to calorie counting, in an attempt to

consume less calories than they burn, and thus achieve weight loss. Calorie counting is therefore only effective if we can accurately

estimate the number of calories a person burns each day. Energy expenditure is the total number of calories burned while at rest plus

calories burned during exercise. Trexler et al. (2014) state that up to 70% of our daily energy expenditure is from Basal Metabolic

Rate (BMR). BMR referrers to the number of calories our bodies burn while at rest. It is the number of calories required to keep

our bodies functioning, through for example, breathing, blood circulation, and temperature regulation.

Lifesum is a mobile health app used for meal and exercise tracking aiming to help users achieve their weight goals. Lifesum

focuses on helping people make better food choices, by helping the user choose a diet that fits their lifestyle and goals. It provides

tips and feedback based on the chosen diet plan and the user’s goal. Lifesum provides a platform to easily track calories, macronu-

trients, and water consumed each day. The app syncs to popular fitness devices to simplify exercise tracking. A user can track

weight and body measurements within the app allowing them to visualise their weight journey. Lifesum recommends how many

calories each user should eat each day based on the user’s goal type and BMR, recommending a safe calorie deficit for weight loss

or calorie surplus for weight gain. The amount of exercise tracked is also considered when recommending how many calories a

user should consume, increasing the calorie target as users track exercise throughout the day.

Currently Lifesum use the Harris Benedict Equation in order to estimate a users BMR. The Harris Benedict BMR estimate is

based on an individual’s gender, age, height and weight, and is calculated as follows (Harris and Benedict 1918):

Men: BMR = 66.5 + (13.75× weight(kg)) + (5.003× height(cm))− (6.755× age(years))

Women: BMR = 655.1 + (9.563× weight(kg)) + (1.850× height(cm))− (4.676× age(years))

Many previous studies have focused on the accuracy of the Harris Benedict Equation. Owen et al. (1986) found that the Harris

Benedict Equation overestimated BMR by up to 24% in women. Daly et al. (1985) also found that the Harris Benedict Equation

overestimated BMR, however by 10-15%. Other research in this area suggests that the Harris Benedict Equation is particularly

inaccurate for those who have been obese in the past. Astrup et al. (1999) found that formerly obese subjects have BMRs 3-5%

lower than those estimated by the Harris Benedict equation. Douglas et al. (2007) also found that the Harris Benedict Equation

tends to be less accurate in women who have a history of obesity, finding their actual BMR to be less than the estimated value.

Variation in BMR from one individual to another is caused by many different factors. An individual’s body weight can be di-

vided into a fat component (Fat Mass) and a remainder (Fat Free Mass). A study of adults in Scotland by Johnstone et al. (2005)

found that 63% of the variation between individuals BMR was caused by Fat Free Mass, while 6% came from Fat Mass and 2% was

caused by age. They found 26% of the variation to be unexplained. Other studies have also found Fat Mass and Fat Free Mass to

account for the majority of the variation in BMR (Sabounchi et al. 2013). In addition to this they also found age to be a contributing

factor in BMR variation. They concluded that age has a bigger impact on children than on adults and also on males compared to

1

females.

While Fat Free Mass is the largest contributor to BMR, often obese individuals have a lower BMR than the Harris Benedict

Equation estimates (Sabounchi et al. 2013). This is due to a larger proportion of fat mass, which burns less calories at rest than fat

free mass causing discrepancies in the estimation, (Douglas et al. 2007). As a result of these factors the BMR currently estimated

within the Lifesum app may not be an accurate estimation of each individual user’s BMR. As BMR is the largest factor influencing

a user’s calorie goal, and users adapt their eating to the calorie goal, a poor estimate could lead users to eat in ways that make their

goal difficult or even impossible to achieve.

Change in weight is caused by the difference between total energy expenditure and the number of calories consumed through

food. Therefore in addition to BMR, the number of calories consumed needs to be accurately tracked in order for a user to lose

weight. This is often not the case due to various difficulties in self reporting. When individuals self-report calories they tend to

under report by an average of 20% (Livingstone et al. 1990). Other studies concluded that calories consumed were underreported

by both obese and nonobease individuals, however the proportion undertracked is much larger amongst obese individuals (Bandini

et al. 1990, King et al. 2016). Obese subjects underreport their calorie intake by an average of 47% while non obese subjects

underreported by an average of 19% (Lichtman et al. 1992). In addition to this research, Livingstone et al. (1990) also concluded

that 12 of 20 obese subjects under reported foods that are high in calories compared to 3 of 22 lean subjects, suggesting that obese

individuals are also more likely to be selective in the types of food they track.

Another reason for under tracking could be linked to inaccuracies in food labelling. In a study carried out on American restaurant

and supermarket products, Urban et al. (2010) found that on average the calories in food servings were 18% more than the calorie

information provided by restaurants. In addition to this they found that the labels of packaged foods understated the number of

calories by an average of 8%. The U.S. Food and Drug Administration considers a food label to be compliant with their guidelines

if the calories stated are within 20% (in either direction) of the actual amount of calories contained in the product. As a result of

this even when a user tracks what is stated on food packages it is often inaccurate.

Probabilistic Graphical Models (PGMs) have been increasingly popular in the field of health, Lucas et al. (2004) provide a summary

of studies where PGM’s have been of particular interest. This may be because the graphical representation is straightforward to

interpret. In addition to this they can be used to overcome limitations that result from uncertainties in health data, (Sato et al. 2015).

Another advantage is that knowledge about the human metabolism can be leveraged to refine the model structure.

The aim of this thesis is to use probabilistic graphical modelling to estimate BMR based on an individuals personal weight journey

and tracking data in order to gain a better understanding of an individual’s BMR, and thereby to improve their calorie target in-

creasing the likelihood of their success. To build this model we use tracking data from the Lifesum App. This paper aims to answer

four main research questions:

1. How well does the Harris Benedict equation capture an individuals energy balance?

2. Can we estimate the reporting bias of calories by Lifesum users?

3. Is food journaling sufficient to estimate BMR?

4. How much variation in BMR can we measure from tracking data?

2

The remainder of this paper is structured as follows. The next section, provides some background to Probabilistic Graphical Models,

particularly Dynamic Bayesian Modelling which will be used throughout this paper. Section 3 is focused on the Method used, it

contains a description of the data and the model. Section 4 is a report of the results. Finally section 5 provides some concluding

remarks.

2 Background

2.1 Probabilistic Graphical Models

Probabilistic graphical models provide a framework for capturing complex dependencies between random variables that involve

uncertainty. By combining graph theory and probability theory graphical models are used to build large-scale multivariate statistical

models (Wainwright and Jordan (2007)). A Probabilistic Graphical Model is made up of a combination of nodes and edges. Each

node is associated with a random variable and the edges represent the conditional relationships between the random variables.

Probabilistic Graphical Models perform Inference and Learning by incorporating prior knowledge within the model.

2.2 Bayesian Networks

A Bayesian Network (BN) is one of the most common forms of Probabilistic Graphical Models. A Bayesian Network is made up

of two components, (Larrañaga et al. (2012)). First, a Directed Acyclic Graph (DAG) which represents the structure of the system

and the conditional dependencies between variables. A DAG consists of nodes and edges which have the same meaning as above.

The second component of a Bayesian Network is a set of parameters, which based on the structure of the DAG, state the conditional

probability distribution for each variable given its parents. Figure 1 is an example of a simple Bayesian Network. This graphical

representation is a tractable way of modelling the joint distribution P(A,B,C). For example, the edge (A,C) connects the random

variable A to the random variable C, and means that P(C|A) is a factor in the joint probability distribution.

Figure 1: A Directed Acyclic Graph (DAG) of a Simple Bayesian Network

2.3 Dynamic Bayesian Networks

A Dynamic Bayesian Network (DBN) is an extension of a Bayesian Network, where the system’s state is changing over time. A

basic assumption of a DBN is that time can be discretized into a set of equally spaced time slices. This assumption simplifies the

problem from representing distributions over a continuum of random variables to representing distributions over countably many

3

random variables, sampled at discrete intervals. A dynamic Bayesian Network allows connections within time slices known as

intra-slice connections as well as connections between consecutive time slices known as inter-time slice connections. Inter-time

slice connections therefore imply conditional probabilities between time slices. In addition to this, variables within a DBN must

satisfy the Markov Assumption, that is, the state of the system at time t +1 depends on its immediate past, i.e. its state at time t,

(Mihajlovic and Petkovic (2001)). It is therefore independent of all timepoints prior to t (denoted X(0:(t−1))). Thus, for all t ≥ 0:

(X(t+1) ⊥ X(0:(t−1)) | X(t))

Where X = {x1, ..., .xn} is the set of random variables.

The states of a dynamic model are not always directly observable. A variable that is not observed is referred to as a latent vari-

able. Latent variables may influence other variables that can be directly measured. Each state within a dynamic model at one time

slice may depend on one or more states at the previous time instance and/or on some states in the same time slice. A graphical

representation of a Dynamic Bayesian Network can be seen below in Figure 2.

Figure 2: An example of a Dynamic Bayesian Network

Now, let X = {x1, ..., .xn} be the set of latent variables, while Y = {y1, ..., yn} is the set of observable variables. T is the time

boundary. Combining this notation with the Markov assumption we can define the distribution of the variables sampled over time

t = 0, ...T as:

P (X,Y ) =

T∏t=1

Pr(X(t+1)|X(t))

T∏t=1

Pr(Y(t+1)|X(t+1))Pr(X(0))

In order to completely specify a DBN we need to define three sets of parameters (Mihajlovic and Petkovic 2001):

– State transition conditional probability distributions (CPD) Pr(X(t+1)|X(t)), which represent the time dependencies be-

tween each of the states

– Observation conditional probability distributions Pr(Y(t+1)|X(t+1)), which represent the dependencies between each of the

nodes at time slice t+1

– Initial state distribution’s Pr(X(0)), which specify the initial probability distributions at the start of the process.

4

3 Method

3.1 Data

This thesis uses a proprietary data set collected using the Lifesum App, from January 2017 to February 2019. Upon signup a user

enters information, including gender, height and their weight goal. Users can track meals and snacks on a daily basis. They can

also track exercise by entering the exercise type and duration manually, or by syncing the Lifesum app with other devices used to

record exercise (E.g. Apple Watch or Fitbit). Users can also track their weight and body measurements over time either manually

or by syncing with smart scales.

As tracking is voluntary there are very large quantities of missing data. The missing data is not completely at random. For example,

a user who does not log their food consumption on a given day may skip tracking as they consumed more than the recommended

amount of calories that day. In order to limit confounding effects from this missingness this thesis focuses on a sample of users.

The sample contains users who fulfil the following criteria for a period of at least 14 consecutive days:

1. Users who have recorded at least one food item for breakfast lunch and dinner.

2. Users who have recorded their weight daily through the use of Smart Scales that are synced with the Lifesum app.

3. Users who track exercise through a Smart Watch which is also synced with the Lifesum app. Tracking through synced

devices reduces self reporting inaccuracies.

4. Users over the age of 20 are selected to ensure that growth has ended. Teenagers are likely to still be growing and it is

therefore possible that their height may not be up to date.

Table 1 contains user demographics. The sample consists of 219 individual users from all over the world, 130 of which are male

and 89 are female. The users range from 21 to 80 years old with an average age of 43. Each user has tracked a minimum of 14

consecutive days and in total the data set contains 7887 tracked days. Body Mass Index (BMI) was calculated for each user on the

first day of their tracking streak. Users from all BMI categories are included. BMI ranges from 16 which is considered underweight

to 45 which is considered obese. The average BMI is 26 which is considered to be overweight, however most users are of normal

(healthy) weight. Table 2 contains the number of users in each BMI category. 60 users (27.4%) are obese.

Table 1: User Demographics

Total (n=219) Male (n=130) Female(n=89)

Min Max Mean Min Max Mean Min Max Mean

Age 21 80 43 21 80 44 24 70 40

Height (cm) 152.4 203.2 175.6 163.0 203.2 179.3 152.’ 185.0 166.9

Weight (kg) 45.7 165.6 80.3 53.1 165.4 82.3 45.7 138.8 75.46

BMI 16.7 44.6 26.0 16.7 44.6 25.6 17.4 43.9 27.0

5

Table 2: Number fo users in each BMI category

BMI Category

Underweight Normal (Healthy) Weight Overweight Obese

Number of Users 3 84 72 60

In order to link a user’s daily activities to changes in weight, time of weighin is an important factor to consider. If users weigh them-

selves in the morning, the energy balance from the previous day will impact the current day’s weight. If users weigh themselves in

the evening the weight observed on the current day is impacted by the energy balance of that day. Figure 3 shows a histogram of

weighin times. The majority (69.4%) of the sample of users recorded their weight before 12pm. Therefore we make the simplifying

assumption that a day’s energy balance is reflected in the next day’s weighin.

Figure 3: Histogram showing the time of day Lifesum users track daily weight (n=219)

3.2 Self Reporting Bias of Calories Consumed

As discussed previously, change in weight occurs as a result of the balance between calories consumed through food (calories in)

and calories burned (calories out). It is therefore important that the number of calories consumed are accurately tracked in order for

a user to successfully achieve their weight goals. Previous research has suggested that individuals tend to under track food by up to

47% (Lichtman et al. 1992).

Energy balance can be represented by the following equation:

κ× Cals In− Cals Out− ω ×∆ weight = 0 (1)

Where

- Cals out denotes the total number of calories burned by BMR and exercise

6

- κ is the proportion of calories under/over tracked

- ω kcal/kg is the number of kilocalories in 1kg of body weight.

- ∆ weight denotes change in weight from time t to t+1

Therefore in order to find the proportion of calories undertracked by the sample of users we aim to find κ and ω such that:

loss(κ, ω) =∑i

|κ× Cals ini − Cals outi − ω ×∆ weighti| (2)

is minimised.

3.3 Metabolic Model

3.3.1 Model Representation

Figure 4: Dynamic Bayesian Model for BMR over two time slices

Graphical representation of the metabolic model is seen in Figure 4. This model is a Dynamic Bayesian Model in which each time

slice represents one day, and the number of time slices is individual to each user based on the number of consecutive days tracked.

Each user tracked a minimum of one 14 day streak, and some users may have tracked mulitple steaks. Observed variables are

7

represented by a node with a solid outline, while latent variables are represented by a node with a dashed outline. The model is

described mathematically by defining the conditional and initial distributions below.

The Conditional Probability Distribution for the model, this includes the state transition and observations, can be written as:

Pr(BMRt+1,EBt+1,∆wt+1,Wt+1,OWt+1) = Pr(BMRt+1|BMRt)Pr(EBt+1|Ft+1,Ext+1,BMRt+1)

× Pr(∆wt+1|EBt+1)Pr(Wt+1|∆Wt,Wt)Pr(OWt+1|Wt+1)(3)

The initial state distribution of the model can be written as:

Pr(BMR0,EB0,∆W0,W0,OW0) = Pr(BMR0)Pr(EB0|F0,Ex0,BMR0)Pr(∆W0|EB0)Pr(W0)Pr(OW0|W0) (4)

The model consists of only continuous variables and therefore the conditional probability distributions can be described as Gaussian

process’ (Shachter and Kenley 1989). Thus, each individual variable has a normal prior distribution. More specifically the prior

distributions of each variable are defined as:

Latent BMR

BMRt ∼ N (BMRt−1, α) (5)

α ∼ Exp(1/10) (6)

BMR at each time point is normally distributed with a mean of the estimated value of BMR at the previous time slice, and a standard

deviation of α. α acts as the step size. In other words α is the amount BMR can change from one day to the next. As the value of

α is uncertain a distribution is placed around it. Because α is the variance of BMR it is therefore strictly positive, leading to the

choice of an exponential prior.

The initial value of BMR is also normally distributed as:

BMR0 ∼ N (β, 0.25 ∗ β) (7)

Where β is the BMR value on day 0 estimated by the Harris Benedict Equation. The standard deviation is 0.25 ∗ β as previous

research found individuals to vary from the Harris Benedict equation by up to 24% (Owen et al. 1986), the chosen value allows for

slightly more variation.

Energy Balance

Energy Balance is calculated at each time point within the model, using the observed values for calories consumed and calories

burned though exercise, as well as the individualised value for undertracking bias.

Calculated Energy Balancet = Tracking Bias× Calories In Foodt − Calories Out BMRt − Calories burned by Exerciset (8)

As there is uncertainty around the value of energy balance we place a normal distribution around it.

Energy Balancet ∼ N (Calculated Energy Balancet, γ) (9)

8

γ ∼ Exp(1/ε) (10)

Where ε is an individualised value of the standard deviation of calculated energy balance for each user. In this case when calculating

energy balance the Harris Benedict estimation of BMR was used. The value is the same over all the days of tracking. Again since

there is uncertainty in ε we place a distribution around it.

∆ weight (Change in weight)

∆ Weightt = EBt/ω (11)

ω represents the number of kilocalories in 1kg of fat as in equation (1). In weight loss the primary objective is to lose body fat,

which contains 7700kcal/kg (Hall 2008). As a simplifying assumption, we assume all weight lost or gained is body fat, hence we

set ω to 7700 kcal/kg.

Latent weight

Latent weight at each time point is normally distributed with a mean of the estimated value of Latent weight at the previous time

point plus Energy Balance at the current time point, and a standard deviation of ∆.

Latent Weightt ∼ N (Latent weightt−1 + Energy Balancet,∆) (12)

∆ corresponds to how much we believe latent weight can change in a day, we suspect latent weight to change at around 0.15kg.

However, there is uncertainty in the estimated value of ∆ so it also follows an exponential distribution.

∆ ∼ Exp(1/0.15) (13)

Observed weight:

Observed Weightt ∼ N (Latent Weightt, ζ) (14)

Where ζ is an individualised value of the variance in observed weight change. As this value is observed there is no distribution

placed around it.

3.3.2 Inference

Inference in a DBN is defined as finding the values of the latent states given the observed states at each time slice (Mihajlovic and

Petkovic 2001). Inference can be represented mathematically as

Pr(XT−10 |Y T−1

0 ) (15)

Where Y T−10 denotes the set of T consecutive observed variables, Y T−1

0 = {y0, y1, ..., yT−1} and XT−10 represents the set of

latent variables XT−10 = {x0, x1, ..., xT−1}

Inference is calculated by No-U-Turn (NUTs) Sampling. NUTS is a Markov Chain Monte Carlo (MCMC) sampling method

that is very similar to Hamiltonian Monte Carlo (HMC). A disadvantage of HMC is it requires a step size to be chosen, which is

often difficult to decide upon. NUTs performs inference without the need to specify a step size. It is especially useful in performing

inference on models that contain many continuous variables. Using the log-posterior-density, NUTS uses information about the

9

regions of higher probability, it therefore converges faster than other MCMC sampling methods. To summarise, NUTS makes it

possible to efficiently perform Bayesian posterior inference on a large class of complex, high-dimensional models with minimal

human intervention. A description of the NUTs algorithm can be found in Hoffman and Gelman (2011) (Algorithm 6).

3.3.3 Prediction Analysis

We aim to predict the weight of each individual at the next time slice, based off the information obtained about the latent variables in

the current time slice. Prediction is therefore regarded as an inference problem, and can be described by the calculation (Mihajlovic

and Petkovic 2001):

Pr(xt+1|Y t0 ) =

∑xtPr(xt+1|xt)αt(xt)∑

xtαt(xt)

(16)

In the same way

Pr(yt+1|Y t0 ) =

∑xt+1

αt+1(xt+1)∑xtαt(xt)

(17)

Where αt(xt) is the forward probability distribution describing the joint probability observations at time t. Therefore

αt(xt) = Pr(Y t0 , xt) (18)

From which it follows

αt+1(xt+1) = Pr(yt+1|xt+1)∑

xt

Pr(xt+1|xt)αt(xt) (19)

4 Results

4.1 Correlation of Harris Benedict and Energy Balance

Measured weight can vary a lot from one day to the next, Denning et al. (1990) found daily weight variation to be up to 4kg in

women. Figure 5 below shows that for Lifesum users measured weight can change by up to 3kg in either direction. The majority

of weight fluctuation each day is due to fluid balance, but some can be a result of changes in muscle or fat mass. Latent weight

is a person’s theoretical underlying weight, at some neutral level of hydration and digestive activity. It therefore removes noise in

the measured value that results from fluid variation. Changes in latent weight are directly caused by energy balance. In order to

investigate how well the Harris Benedict Equation captures Energy Balance we look at the correlation between energy balance and

observed weight change.

Where Energy balance is defined as:

Energy Balance = Total Calories consumed - Calories Out Exercise - Calories Out BMR

Figure 5 below shows the correlation between Energy balance at time t and change in weight (from time t to t+1). In this case

Energy Balance is calculated using the estimate of BMR from the Harris Benedict Equation. A very weak correlation (r=0.056)

exists, which is surprisingly low given the reported effectiveness of calorie counting (Hartmann-Boyce et al. 2014), and that calorie

targets rely on this equation. While some of this lack of correlation can be explained by changes in water balance which we cannot

measure since water contains zero calories but accounts for most of daily weight variability, we still expect to see some correlation.

Another possible factor may be due to users undertracking the amount of calories they consume, in addition to an inaccurate esti-

mation of BMR by the Harris Benedict equation.

10

Figure 5: Relationship between Energy Balance at time t-1 and Weight Change between t-1 and t, for each day

and user based on Lifesum tracking data and the Harris Benedict BMR calculation.

4.2 Self Reporting Bias Results

Previous research suggests that a self reporting bias exists when food journaling. The method to estimate self reporting bias is

shown in equation (2). Hall (2008) states that ω is around 7700 kcal/kg. Therefore fixing ω to 7700 in equation (2) above we

find that on average users under track the amount of calories they consume on a daily basis by about 7.9%. This value is lower

than that found in prior research which typically focused on tracking in obese subjects, but in the sample of users analysed, only

27.4% are obese. Therefore it is plausible that undertracking rates are lower than previous studies. To try to capture potential

undertracking effects, we estimated undertracking for each user at an individual level, as mentioned in Dynamic Bayesian Model

previously introduced.

4.3 Model Inference

The metabolic model estimates a value for each variable within the model each day by sampling from the posterior distributions.

Figure 6 compares the results from the Metabolic Model with the observed data for 3 users chosen at random from different BMI

categories. Information on the user’s characteristics can be found in Table 3 below.

Figure 6.A shows a comparison between BMR estimated by the model and by the Harris Benedict equation for an obese user.

This user has a BMR calculated by the Harris Benedict equation which changes each day within the range of 2075-2095 calories

11

over his 22 day streak. There is large variation in BMR between consecutive days when BMR is estimated using the Harris Bene-

dict equation. The Metabolic Model estimates this users BMR to be in general, higher than the Harris Benedict estimation. It also

evolves at at a much steadier rate. Figure 6.B shows that the metabolic model estimates latent weight to also evolve at a steadier

rate than observed values of weight. The estimation of latent weight is also within the range of the changes in measured weight as

expected.

The overweight user in Figure 6.C has a BMR estimated by the Harris Benedict equation in the range of 1235-1255 calories

throughout her tracking streak. Again the metabolic model estimates BMR to be in general higher than that of the Harris Benedict

equation, as seen in Figure 6.C. The model estimates BMR to evolve at a steadier rate then that of the Harris Benedict equation.

Figure 6.D shows that the metabolic model estimates latent weight to also evolve at a steadier rate than observed values of weight.

This user however has less fluctuation in observed weight between days than the obese user.

A user who is considered to have a normal (healthy) weight is shown in Figure 6.E. This user has a BMR estimated from the

Harris Benedict equation in the range of 1110-1137 calories. The metabolic model estimates BMR to be lower than the Harris

Benedict estimation for the first 15 days of the streak, and then higher for the remaining 13 days. The model estimates BMR to

change over time at a steadier rate compared to the BMR estimated by the Harris Benedict equation which fluctuates more from

one day to the next.

For all 3 users the Harris Benedict estimation of BMR evolves over time following the same pattern as observed weight change.

This is to be expected as measured weight is the only variable changing in the Harris Benedict Equation as time evolves (over short

time periods). This is however not the case in the results produced from the metabolic model, in which case the estimation of BMR

responds to changing levels of calories consumed through food and calories burned through exercise each day.

As the model estimates BMR to be in a similar range to the Harris Benedict estimation it can be concluded that food journal-

ing is sufficient to estimate BMR. That being said the accuracy of the BMR estimation produced by the model remains unclear,

without expensive measurement of BMR for these subjects.

Table 3: User CharacteristicsBMI Category Age Gender Height (cm) Initial Weight (kg) Initial BMI

Obese 39 Male 177.8 116.1 36.7

Overweight 55 Female 163.0 70.4 26.5

Normal (Healthy) Weight 51 Female 168.0 55.7 19.7

12

Obese User

Normal (Healthy) User

Overweight User

A B

C

E F

D

Figure 6: Variable estimation from metabolic model compared to observed values Plots on the left (A, C, E) show

the Harris Benedict estimate of BMR over time compared to the estimated value from NUTs sampling. The plots on

the right (B, D, F) compare measured weight over time with the value of latent weight estimated from NUTs sampling.

13

4.4 Prediction Results

As a final method of model validation, we use the Dynamic Bayesian Model to predict next-day observed weight, and compare the

results to two baselines. Table 4 shows the Root Mean Squared Error (RMSE) for the next day weight prediction, for all users over

all tracking days, as well as users grouped by BMI category. Prediction using the Dynamic Bayesian Model was computed using

the NUT’s sampling method. As our first baseline, we use a naive approach, LOCF (Last Observation Carried Forward), estimating

that a users weight at time t + 1 will be the same as at time t. As a second baseline, we compare prediction based on the Harris

Benedict estimate and naive energy balance calculations.

Table 4: RMSE’s of next day predicted weight comparing different prediction methods.

User type NUTsHarris Benedict

EstimateLOCF

Obese 1.287 1.009 1.035

Overweight 0.572 0.551 0.565

Normal (Healthy) Weight 0.566 0.552 0.557

Underweight 0.571 0.564 0.566

All Users 0.832 0.708 0.723

Overall the Harris Benedict Equation predicts next day weight most accurately for all users compared to the other methods of

prediction (RMSE: 0.708). It also predicts next day weight the most accurately for each individual BMI category. The metabolic

model proposed predicts next day weight less accurately compared to the Harris Benedict equation and the LOCF method for all

users. This is also the case for each separate BMI category. RMSE is greatest for obese individuals across all methods. Therefore

the Harris Benedict Equation and the metabolic model proposed in this paper are least accurate for obese subjects compared with

the other BMI categories.

4.4.1 Prediction Error Analysis

The largest errors in prediction occurred in users that had multiple tracking streaks. Figure 7 shows a comparison of the actual

next day weight with the prediction produced by the Metabolic Model as well as the prediction produced by the Harris Benedict

Equation, for the second tracking streak of the two users that had the largest prediction errors. From this plot it can be concluded

that the Metabolic Model’s prediction takes longer to adjust after there is a gap between streaks than the Harris Benedict Prediction.

The Harris Benedict Equation adjusts to the new weight value faster as the estimation is based on observed weight directly.

Table 5 shows the RMSE’s for each prediction method, when the data contains only the first streak for each user. The Metabolic

Model now predicts next day weight more accurately than LOCF for all users, however the difference in accuracy is very small

(RMSE: 0.539 vs 0.541). It is also more accurate than LOCF for each separate BMI category, except for Normal weight users,

where LOCF is slightly more accurate. The Metabolic Model also predicts next day weight more accurately than the Harris Bene-

dict equation for users that are underweight, however due to the low sample size in this group this result should be interpreted

with some caution. While removing multiple streaks for each user in the data improves prediction accuracy there is still room for

improvement in the model. The Harris Benedict equation continues to predict next day weight more accurately than the proposed

model for all users, and for each BMI category other than underweight users.

14

Figure 7: Actual and Predicted Values of Next Day Weight. A shows the comparison for the user with the largest

prediction for the second tracking streak only. B shows the comparison for the user with the second largest prediction

errors for the second tracking streak only.

Table 5: RMSE’s of next day predicted weight comparing different prediction methods (One tracking streak only).

User type NUTsHarris Benedict

EstimateLOCF

Obese 0.572 0.549 0.574

Overweight 0.515 0.510 0.522

Normal (Healthy) Weight 0.538 0.527 0.534

Underweight 0.454 0.510 0.523

All Users 0.539 0.528 0.541

15

5 Discussion

This paper aims to estimate an individuals BMR and how it changes over time by using food and exercise tracking data. We use

data obtained form the Lifesum App for 219 users ranging in characteristics. By investigating the correlation between daily total

energy expenditure and change in weight from day t to t+1, it was concluded that BMR estimated by the Harris Benedict equation

does not adequately capture an individual’s energy balance. The low correlations could be a result of inaccuracies in tracking data

or inaccuracies in the estimation of BMR. Therefore a tracking bias was calculated for the users in the study, finding that on average

users undertrack by 7.9% of calories consumed each day.

A Dynamic Bayesian Model was used to estimate BMR. From this model it was concluded that food journaling is sufficient to

estimate BMR, however more research in this field is required. Improvements could be made to the model in order to increase the

accuracy of the estimated BMR, for example changing the prior distributions to better fit the data. A method of model validation

was to predict next day weight from the estimate of BMR produced by the proposed model. Prediction analysis concluded that the

Harris Benedict equation remains a more accurate estimate of BMR, both overall, and for each individual BMI category in compar-

ison to BMR estimated by the model. While self reporting bias was calculated for each user within this model it was constant over

all days of a users tracking streak. Calculating this bias day by day may lead to more accurate results. In addition to this the results

could be improved by accounting for gaps between tracking streaks within the model. Unfortunately time restrictions did not allow

for these alterations.

Previous studies have indicated that the Harris Benedict Equation is particularly inaccurate for obese individuals. This study also

confirms this, finding the Harris Benedict to perform worse when predicting next day weight of obese individuals in comparison

to those in other BMI categories. The Metabolic Model proposed in this paper also performed worse on obese subjects compared

with others. Therefore further research is required to find a more accurate method to estimate BMR for the obese population.

16

Acknowledgements

I would like to thank Lifesum for providing me with the data used in this study. A special thanks is extended to the data team

at Lifesum, particularly Lars Yencken and Pauline Vercruysse, for their support throughout this project. In addition to this they

contributed to many insightful discussions that have made this paper possible. Finally, I would like to thank Harry Khamis, Uppsala

University, for his guidance during this study.

17

References

Astrup, Arne, Peter C. Gøtzsche, Karen van de Werken, Claudia Ranneries, Søren Toubro, Anne Raben, and Benjamin

Buemann (1999). “Meta-analysis of resting metabolic rate in formerly obese subjects”. en. In: The American

Journal of Clinical Nutrition 69.6, pp. 1117–1122. ISSN: 0002-9165. DOI: 10.1093/ajcn/69.6.1117. URL:

http://academic.oup.com/ajcn/article/69/6/1117/4714917 (visited on 04/26/2019).

Bandini, L. G., D. A. Schoeller, H. N. Cyr, and W. H. Dietz (1990). “Validity of reported energy intake in obese and

nonobese adolescents”. eng. In: The American Journal of Clinical Nutrition 52.3, pp. 421–425. ISSN: 0002-9165.

DOI: 10.1093/ajcn/52.3.421.

Daly, J. M., S. B. Heymsfield, C. A. Head, L. P. Harvey, D. W. Nixon, H. Katzeff, and G. D. Grossman (1985).

“Human energy requirements: overestimation by widely used prediction equation”. eng. In: The American Journal

of Clinical Nutrition 42.6, pp. 1170–1174. ISSN: 0002-9165. DOI: 10.1093/ajcn/42.6.1170.

Denning, D. W., M. G. Dunnigan, J. Tillman, J. A. Davis, and C. A. Forrest (1990). “The relationship between ’normal’

fluid retention in women and idiopathic oedema.” en. In: Postgraduate Medical Journal 66.775, pp. 363–366. ISSN:

0032-5473, 1469-0756. DOI: 10.1136/pgmj.66.775.363. URL: https://pmj.bmj.com/content/

66/775/363 (visited on 05/23/2019).

Douglas, Crystal C., Jeannine C. Lawrence, Nikki C. Bush, Robert A. Oster, Barbara A. Gower, and Betty E. Darnell

(2007). “Ability of the Harris Benedict formula to predict energy requirements differs with weight history and

ethnicity”. In: Nutrition research (New York, N.Y.) 27.4, pp. 194–199. ISSN: 0271-5317. DOI: 10.1016/j.

nutres.2007.01.016. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2598419/

(visited on 04/25/2019).

Hall, K. D. (2008). “What is the required energy deficit per unit weight loss?” eng. In: International Journal of Obesity

(2005) 32.3, pp. 573–576. ISSN: 1476-5497. DOI: 10.1038/sj.ijo.0803720.

Harris, J. Arthur and Francis G. Benedict (1918). “A Biometric Study of Human Basal Metabolism”. In: Proceedings

of the National Academy of Sciences of the United States of America 4.12, pp. 370–373. ISSN: 0027-8424. URL:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1091498/ (visited on 05/13/2019).

Hartmann-Boyce, J., D. J. Johns, S. A. Jebb, P. Aveyard, and Behavioural Weight Management Review Group (2014).

“Effect of behavioural techniques and delivery mode on effectiveness of weight management: systematic review,

meta-analysis and meta-regression”. eng. In: Obesity Reviews: An Official Journal of the International Association

for the Study of Obesity 15.7, pp. 598–609. ISSN: 1467-789X. DOI: 10.1111/obr.12165.

Hoffman, Matthew D. and Andrew Gelman (2011). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in

Hamiltonian Monte Carlo”. In: arXiv:1111.4246 [cs, stat]. URL: http://arxiv.org/abs/1111.4246

(visited on 03/04/2019).

Johnstone, Alexandra M., Sandra D. Murison, Jackie S. Duncan, Kellie A. Rance, and John R. Speakman (2005).

“Factors influencing variation in basal metabolic rate include fat-free mass, fat mass, age, and circulating thyrox-

ine but not sex, circulating leptin, or triiodothyronine”. en. In: The American Journal of Clinical Nutrition 82.5,

18

https://doi.org/10.1093/ajcn/69.6.1117

http://academic.oup.com/ajcn/article/69/6/1117/4714917



https://doi.org/10.1136/pgmj.66.775.363

https://pmj.bmj.com/content/66/775/363

https://pmj.bmj.com/content/66/775/363

https://doi.org/10.1016/j.nutres.2007.01.016

https://doi.org/10.1016/j.nutres.2007.01.016

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2598419/

https://doi.org/10.1038/sj.ijo.0803720


https://doi.org/10.1111/obr.12165

http://arxiv.org/abs/1111.4246

pp. 941–948. ISSN: 0002-9165. DOI: 10.1093/ajcn/82.5.941. URL: https://academic.oup.com/

ajcn/article/82/5/941/4607670 (visited on 02/11/2019).

Kelly, Evelyn B. (2018). Obesity, 2nd Edition. Santa Barbara, UNITED STATES: ABC-CLIO, LLC. ISBN: 978-1-

4408-5882-6. URL: http://ebookcentral.proquest.com/lib/uu/detail.action?docID=

5341454 (visited on 05/13/2019).

King, Bruce M., Amanda N. Ivester, Priscilla D. Burgess, Kimberly M. Shappell, Katherine L. Coleman, Victoria M.

Cespedes, Harriet S. Pruitt, Grace K. Burden, and Eric S. Bour (2016). Adults with Obesity Underreport High-

calorie Foods in the Home. en. Text. DOI: info:doi/10.14485/HBPR.3.5.4. URL: https://

www.ingentaconnect.com/contentone/psp/hbpr/2016/00000003/00000005/art00004#

(visited on 02/26/2019).

Larrañaga, Pedro, Hossein Karshenas, Concha Bielza, and Roberto Santana (2012). “A review on probabilistic graph-

ical models in evolutionary computation”. In: Journal of Heuristics 18.5, pp. 795–819. ISSN: 1572-9397. DOI:

10.1007/s10732-012-9208-4. URL: https://doi.org/10.1007/s10732-012-9208-4.

Lichtman, S. W., K. Pisarska, E. R. Berman, M. Pestone, H. Dowling, E. Offenbacher, H. Weisel, S. Heshka, D. E.

Matthews, and S. B. Heymsfield (1992). “Discrepancy between self-reported and actual caloric intake and exercise

in obese subjects”. eng. In: The New England Journal of Medicine 327.27, pp. 1893–1898. ISSN: 0028-4793. DOI:

10.1056/NEJM199212313272701.

Livingstone, M. B., A. M. Prentice, J. J. Strain, W. A. Coward, A. E. Black, M. E. Barker, P. G. McKenna, and R. G.

Whitehead (1990). “Accuracy of weighed dietary records in studies of diet and health.” en. In: BMJ 300.6726,

pp. 708–712. ISSN: 0959-8138, 1468-5833. DOI: 10.1136/bmj.300.6726.708. URL: https://www.

bmj.com/content/300/6726/708 (visited on 02/27/2019).

Lucas, Peter J.F. et al. (2004). “Bayesian networks in biomedicine and health-care”. en. In: Artificial Intelligence in

Medicine 30.3, pp. 201–214. ISSN: 09333657. DOI: 10.1016/j.artmed.2003.11.001. URL: https:

//linkinghub.elsevier.com/retrieve/pii/S0933365703001313 (visited on 05/13/2019).

Mihajlovic, V. and M. Petkovic (2001). “Dynamic Bayesian Networks: A State of the Art”. Undefined. In: URL:

https://research.utwente.nl/en/publications/dynamic-bayesian-networks-a-

state-of-the-art (visited on 04/24/2019).

NIDDK (2014). Overweight & Obesity Statistics | NIDDK. en-US. URL: https://www.niddk.nih.gov/

health-information/health-statistics/overweight-obesity (visited on 05/07/2019).

Organisation (WHO), World health (2016). Obesity and overweight. en. URL: https://www.who.int/news-

room/fact-sheets/detail/obesity-and-overweight (visited on 05/07/2019).

Owen, O. E., E. Kavle, R. S. Owen, M. Polansky, S. Caprio, M. A. Mozzoli, Z. V. Kendrick, M. C. Bushman, and

G. Boden (1986). “A reappraisal of caloric requirements in healthy women”. eng. In: The American Journal of

Clinical Nutrition 44.1, pp. 1–19. ISSN: 0002-9165. DOI: 10.1093/ajcn/44.1.1.

Sabounchi, Nasim S. et al. (2013). “Best Fitting Prediction Equations for Basal Metabolic Rate: Informing Obesity

Interventions in Diverse Populations”. In: International journal of obesity (2005) 37.10, pp. 1364–1370. ISSN:

19


https://academic.oup.com/ajcn/article/82/5/941/4607670

https://academic.oup.com/ajcn/article/82/5/941/4607670

http://ebookcentral.proquest.com/lib/uu/detail.action?docID=5341454

http://ebookcentral.proquest.com/lib/uu/detail.action?docID=5341454

https://doi.org/info:doi/10.14485/HBPR.3.5.4

https://www.ingentaconnect.com/contentone/psp/hbpr/2016/00000003/00000005/art00004#

https://www.ingentaconnect.com/contentone/psp/hbpr/2016/00000003/00000005/art00004#

https://doi.org/10.1007/s10732-012-9208-4

https://doi.org/10.1007/s10732-012-9208-4

https://doi.org/10.1056/NEJM199212313272701

https://doi.org/10.1136/bmj.300.6726.708

https://www.bmj.com/content/300/6726/708

https://www.bmj.com/content/300/6726/708

https://doi.org/10.1016/j.artmed.2003.11.001

https://linkinghub.elsevier.com/retrieve/pii/S0933365703001313

https://linkinghub.elsevier.com/retrieve/pii/S0933365703001313

https://research.utwente.nl/en/publications/dynamic-bayesian-networks-a-state-of-the-art

https://research.utwente.nl/en/publications/dynamic-bayesian-networks-a-state-of-the-art

https://www.niddk.nih.gov/health-information/health-statistics/overweight-obesity

https://www.niddk.nih.gov/health-information/health-statistics/overweight-obesity

https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight

https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight


0307-0565. DOI: 10.1038/ijo.2012.218. URL: https://www.ncbi.nlm.nih.gov/pmc/

articles/PMC4278349/ (visited on 02/01/2019).

Sato, Renato Cesar et al. (2015). “Probabilistic graphic models applied to identification of diseases”. In: DOI: 10.

1590/S1679-45082015RB3121.

Shachter, Ross D. and C. Robert Kenley (1989). “Gaussian Influence Diagrams”. In: Management Science 35.5,

pp. 527–550. ISSN: 0025-1909. DOI: 10.1287/mnsc.35.5.527. URL: https://pubsonline.

informs.org/doi/abs/10.1287/mnsc.35.5.527 (visited on 05/20/2019).

Trexler, Eric T. et al. (2014). “Metabolic adaptation to weight loss: implications for the athlete”. In: Journal of the

International Society of Sports Nutrition 11.1, p. 7. ISSN: 1550-2783. DOI: 10.1186/1550-2783-11-7.

URL: https://doi.org/10.1186/1550-2783-11-7.

Urban, Lorien E., Gerard E. Dallal, Lisa M. Robinson, Lynne M. Ausman, Edward Saltzman, and Susan B. Roberts

(2010). “The Accuracy of Stated Energy Contents of Reduced-Energy, Commercially Prepared Foods”. In: Journal

of the American Dietetic Association 110.1, pp. 116–123. ISSN: 0002-8223. DOI: 10.1016/j.jada.2009.

10.003. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2838242/ (visited on

04/25/2019).

Wainwright, Martin J. and Michael I. Jordan (2007). “Graphical Models, Exponential Families, and Variational In-

ference”. en. In: Foundations and Trends R© in Machine Learning 1.1–2, pp. 1–305. ISSN: 1935-8237, 1935-8245.

DOI: 10.1561/2200000001. URL: http://www.nowpublishers.com/article/Details/MAL-

001 (visited on 04/12/2019).

20

https://doi.org/10.1038/ijo.2012.218



https://doi.org/10.1590/S1679-45082015RB3121

https://doi.org/10.1590/S1679-45082015RB3121

https://doi.org/10.1287/mnsc.35.5.527

https://pubsonline.informs.org/doi/abs/10.1287/mnsc.35.5.527

https://pubsonline.informs.org/doi/abs/10.1287/mnsc.35.5.527

https://doi.org/10.1186/1550-2783-11-7

https://doi.org/10.1186/1550-2783-11-7

https://doi.org/10.1016/j.jada.2009.10.003

https://doi.org/10.1016/j.jada.2009.10.003


https://doi.org/10.1561/2200000001

http://www.nowpublishers.com/article/Details/MAL-001

http://www.nowpublishers.com/article/Details/MAL-001

basal metabolic rate (bmr) estimation using probabilistic …1321159/... · 2019-06-07 · 2014)....

Documents