economic forecasting - semester project
TRANSCRIPT
Diet and Development
Linked to Cardiovascular
Disease and Diabetes
Nicholas Rothrauff and Sam Lawson
Decemeber 2015
Economic Forecasting
Abstract
As countries become more developed, there is less need to worry about where our next meal will come
from. This translates into less attention being paid to what it is that we consume throughout our daily
lives. A result of this has been for the composition of our diets to shift in a way that may not be
optimal. We set out to find varying dietary components of a large number of different countries across
the globe. We hope to find significant correlations between these dietary components and one of the most
devastating categories of disease: diabetes and cardiovascular disease. There is a growing epidemic of
these diseases occurring in the developed and developing world, and we hope to gain some further insight
into the factors that are contributing to their rise. We believe they can largely be explained by the
consumption habits within each country.
Introduction
Within the past couple centuries, there has been a dramatic shift around the globe toward
industrialization and urbanization. While this shift has led to an increase in production capacity and
efficiency, it has also altered our lifestyles in many ways. We have adopted a diet that is higher in
calories, yet lower in nutrition. At the same time, technological advances have allowed us to live a
drastically more sedentary lifestyle: jobs that require sitting for 8 hours per day, entertainment
technologies such as television that promote long bouts of sitting, and transportation that now allows us to
travel around the world with near-zero physical exertion. This coupling of less nutritive food and a more
inactive lifestyle is linked to the rise in obesity and the non-communicable diseases that coincide with it.
These diseases make up for a large portion of our country’s GDP every year through our
spending on healthcare. Also, animal husbandry has become one of the leading industries in terms of
pollution. Whether it’s through the methane emissions from the animals or through fertilization and the
use of pesticides, the effects they have on our planet are apparent. As more countries begin to move from
“undeveloped” to “developed” nations, these effects are only going to increase. One possible solution
would be to adopt a diet that is more conducive to a healthy lifestyle.
We posit that there will be a link in the composition of a country’s diet and the country’s
diabetes/cardiovascular disease deaths. The independent variables we will use include: calories from
protein, calories from carbohydrates, daily alcohol consumption, average daily sugar and sweetener
consumption, average daily fruit intake, average daily vegetable oil, healthcare spending per capita
(yearly), and the human development index. We also used dummy variables to describe location where;
Dsub1 equals west, central, north, and east Asia, Dsub2 equals south Asia, southeast Asia, and oceania
Asia, Dsub3 equals North America and Europe, Dsub4 equals South America and Central America, and
when all of these equal zero then the region will be the Carribean and Africa. We will run a multiple
linear regression with diabetes and cardiovascular disease being our dependent variables.
II. Review of related studies
The two studies that were most related to ours were “Globalization of Diabetes” by Frank Hu, and
“Prevention of Coronary Heart Disease by Diet and Lifestyle” by Daan Kromhout, Alessandro Menotti,
Hugo Kesteloot, and Susana Sans. Both of these studies covered the relation of risks of disease, one being
diabetes and the other being heart disease, to diet composition. Frank Hu’s study focused more on the
effects that occurred in certain regions.
Frank Hu’s study found that diabetes was now a global crisis that threatened all nations. Specifically,
he noted that developing nations face the largest issues. Similar to our view, Hu found that diabetes was
“driven by rapid urbanization, nutrition transition, and an increase in sedentary lifestyles.” He also found
that Asia is the region most affected by diabetes. This is because of Asia’s large population and the rapid
economic development that they have experienced. Asia accounts for 60% of the diabetic population in
the world. In his study, Hu found that in 1980 less than 1% of Chinese adults had contracted diabetes, but
by 2008 nearly 10% of all Chinese adults had the disease. Hu’s factors for contracting diabetes are:
smoking (especially in excess), drinking (especially in excess), daily intake of refined carbs (the more you
intake, the more likely you are to contract type 2 diabetes), and physical activity. Hu’s research is
supported by Ramachandran, et. al. and his research into diabetes rates in Asia. Hu’s evidence shows the
fact that many type 2 diabetes instances can be prevented by a healthy diet and daily excercise. He does
make the point that these lifestyle choices require a shift in our “food, built, and social environment.”
The other study, “Prevention of Coronary Heart Disease by Diet and Lifestyle” had similar implication
as Hu’s, but it was instead for heart disease. They found that the most important factors were diet,
smoking, alcohol consumption, and physical exercise. One of the differences in our study and this study
was that they included salt and fish consumption. This study also puts a large emphasis on the amount of
trans fat in people’s diets. The recommendations that the authors make are: Don’t smoke, drink in
moderation if at all, be somewhat active, eat more than 400 grams of vegetables and fruits a day, limit salt
consumption, and eat fish once a week.
III.Hypothesis
We have created our hypothesis based on intuitive beliefs about diet composition and its relation to
diabetes/cardiovascular disease, and the studies we looked into; “Prevention of Coronary Heart Disease
by Diet and Lifestyle” (Kromhout) and “Globalization of Diabetes” (Hu). We decided to test Diabetes
and Cardiovascular disease compared to regions of the world and their respective diet composition. We
wanted to see the deaths from these diseases compared to diet and the development of a country. Our
independent variables we used were: calories from protein, calories from carbohydrates, daily alcohol
consumption, average daily sugars and sweeteners, average daily fruit intake, average daily vegetable oil,
healthcare spending per capita (yearly), and the human development index.
We assume first that a healthier, more plant-based diet will lead to a decrease in deaths from diabetes
or a cardiovascular disease. This is because these diets are much lower in processed sugars and trans
fat. Secondly, we assume that citizens of developing nations or already developed nations will have a
higher risk of dying from diabetes or heart disease. This is because of the transition to less nutritious
foods that developing countries experience and also because of an increase in jobs that require long hours
of sitting. It should also be noted that television and the internet play a role here. Thirdly, though
developing nations promote diets and lifestyles that cause these health complications, they also have more
advanced medical technology. These medical advances reduce the risk of death from one of these
diseases since doctors can catch the symptoms early and emplace preventative measures to stop or slow
the disease. We assume that higher healthcare spending will reduce the risk citizens have of dying from
diabetes and cardiovascular disease. We hypothesize:
RE.1: Higher healthcare spending, average daily fruit intake, and calories from protein will
decrease the risk for dying from diabetes/cardiovascular disease.
RE.2: Higher daily alcohol consumption and daily sugar/sweetener intake will increase the risk
for dying from diabetes/cardiovascular disease.
RE.3: The more developed a country is, the more risk there is for dying from
diabetes/cardiovascular disease.
RE.4: In accord with both Hu’s and Ramachandran et. al.’s study, we suspect that regions in Asia
will have increased risk for developing diabetes/cardiovascular disease.
IV.Sample Description
The data we have compiled was from a few different online databases. These include Chartsbin.com,
WHO.int, Worldbank.org, and FAOstat.com. Carb calories, protein calories, and fat calories were in
units of calories per day. healthcare spending was in units of current U.S. dollars per year. The Human
Development Index was on a scale from 0 - 1, with the higher values indicating greater human
development. The remaining variables (alcohol, sugars and sweeteners, fruits, vegetables, vegetable oils,
and meat consumption) were in units of grams/capita/day. Our dependent variable, diabetes and
cardiovascular disease deaths, was the death rate from these diseases out of 100,000 people. One
interesting thing to note was that the observations for the deaths due to these diseases were from 2008,
while the independent variables are from 2006. While we could not find the observations we were
looking for all in the same year, we think that the later dependent variable would be better predicted by
diet composition in the years before. This is due to the fact that diabetes is not a disease that is contracted
suddenly, but rather results after continuous years of unhealthy habits. The mean, minimum value,
maximum value, range, and standard deviation of our variables are represented in Figure 3-1.
V.Regression Methodology
We started out the model using variables that were far too correlated to each other. These were the
percents of the per capita caloric intake that the average diet was made up of and consisted of percent of
diet composed of carbohydrates, percent of diet composed of fats, and percent of diet composed of
protein. Being that they were proportions of the diet (and made up the majority of total diet), one of the
variables could be easily explained out by the other two. This led us to then search for other variables
that would help to explain our data more precisely. The variables that we found covered a wide array of
more-specific parts of a country’s diet, as well others such as the per capita healthcare spending and the
human development index. Also, we took the proportions of the diet and multiplied them by the total
average calories consumed in each country, and included them within the model. We hoped for this to
help correct the original collinearity problems that we had with them in the previous model. The range of
the average calories per day of the countries was 1920, and we thought that the actual caloric intake (not
percentage) due to the variables would help to represent their impact on diabetes and cardiovascular
disease rates more accurately.
Here, we want to note that our methodology looked more akin to “guess and check.” We started with a
few variables that we thought would be significant, and did not receive the results we had hoped for. This
led us to search for an increasing number of independent variables to help explain some of the data that
we had, with each one helping to fill in some of the missing information. Ultimately, we settled on our
final model (which is represented in figure 1-3) due to the significance of all the independent variables
and what we thought was a very high adj. R2 for what it was we were modeling.
While what follows is not quite the process that we originally followed (due to our adding
variables in roughly one at a time and then retesting), it accurately represents what we did in our attempt
to illustrate the process of including nearly all of the variables we gathered for a “full” model. From
there, we used the skills we acquired in class to help us achieve a model that was the best linear unbiased
estimate.
After our original regression, we found that some of our variables were still very correlated to
each other. These included GDP, which was correlated to the human development index and healthcare
spending, as well as the variable for fat calories, which we thought was correlated to meat consumption
and protein calories. Here is a model that we use:
Figure 1-1 shows the regression results we ended up with.
The significance of some of the variables was not in line with what we had originally thought (the
significance of sugars and sweeteners for example), so we knew that we were missing data that would
help to explain this. In acquiring data on diabetes and cardiovascular disease rates, we noticed that the
concentrations of the diseases were focused within specific regions. This led us to include the dummy
variables for around 10 separate regions. Because many of the regions that we included contained only a
couple observations, their prediction coefficients were not very meaningful. We then decided to narrow
down the regions to 5. We did this on the basis of location and the values of the coefficients that the
model predicted (which was in line with the locations). Here is a the model that we used:
The results of the regression are represented in Figure 1-2.
We then decided to take out meat consumption and vegetables based on the significance level of
their t-stats. We believed meat consumption was insignificant due to the fact that it was partially
explained by other variables as well as the variety of meats that are consumed depending on the
region. We thought that a large portion of this variety and the effect it has on diabetes and cardiovascular
disease was being better explained within our dummy variables. We thought that a similar explanation
holds for the low significance that we found for vegetables: those vegetables that would cause increased
rates of the diseases would be partially explained by sugars, and those that would decrease the disease
rates were being explained by vegetable oils. We also thought that these were being better explained in
accord with our regional dummy variables. To reassure that we should take these variables out of the
model, we performed a partial f-test to test if the coefficients for meat consumption and vegetables were
significant. With an f-stat of .1186, we concluded that we would accept the null hypothesis
(𝜷1 = 𝜷7 = 0) at a significance level of 0.1%. 𝜷1 and 𝜷7 proved to be statistically insignificant.
Here is a representation of our final population model:
Our final model’s regression results are represented in Figure 1-3. The following is our prediction
equation for diabetes and cardiovascular disease deaths based on our independent variables’ coefficients:
We then noticed that there looked to be a violation in the assumption of heteroscedasticity due to
the cone-shape of the residual plots of healthcare spending and alcohol consumption (Figures 2-1 and 2-
4). We tried to alter the dependent variable by using the square root as well as the natural log (Figures 2-
2 and 2-2). The violation seemed to remain. After further consideration, we are convinced that this is not
due to the range of residual values changing based on the x value, but rather due to a larger number of
observations in the lower range of these independent variables. We came to this conclusion because the
deviation of residual values seems to remain fairly consistent across the independent variable’s values,
and it was only the concentration of plotted points that began to thin out due to less observations at the
higher end of each independent variable (alcohol and healthcare spending).
VI.Test Results
Our test results were very much in line with our hypothesis. Carbohydrate calories, alcohol, and
sugar/sweetener consumption were shown to be positively correlated to deaths from diabetes and
cardiovascular disease. The positive correlation between carbohydrate calories and alcohol is caused
because of their effect on blood glucose levels and insulin levels. Also, if alcohol consumption is high, it
can be assumed that many citizens from these regions are not taking good care of their bodies or
exercising much. Per gram of consumption, sugars and sweeteners was shown to have the largest effect
on disease, which was expected due to the direct relationship between sugars and diabetes.
The variables that were negatively correlated to deaths from these diseases were as expected: protein
calories, fruits, vegetable oils, healthcare, and the human development index. Calories from proteins may
have a negative correlation because foods high in protein are usually fairly healthy. Also, seeing as how
the heart is another muscle, protein helps make it stronger. Fruits and vegetable oils have this effect
because they are healthy foods low in fat and processed sugar. Being that the dependent variable was
deaths due to these diseases, and not just disease rates, we expected the variables of healthcare spending
and the human development index to be extremely significant. This was confirmed in the regression
model that showed these individual variables to have a p-value of less than .001.
Our dummy variables were also in accord with what we had expected. The dummy for the relatively
developed nations of North America and Europe was shown to be correlated with deaths from these
diseases, but the coefficient is not as much as the developing countries of Asia (excluding south and
southeast Asia). This is because the developing countries of Asia have had rapid urbanization and a
transition in diet that Frank Hu talks about in his article. According Hu’s study, this is also partially
explained due to genetics of people in certain regions. South and south-east Asia has a negative
coefficient most likely because they have more fish in their diet, since these are coastal regions, and don’t
lead such sedentary lifestyles. As our related studies stated, fish has a negative effect on deaths from
diabetes/cardiovascular disease.
VII.Conclusion Our study was based around the correlation between deaths from diabetes/cardiovascular disease and
national diet composition and the level of the country’s development. We looked into some related
studies done by Frank Hu and a compilation of other authors. These studies showed that there was major
correlation between diabetes/cardiovascular disease and diet, region, and lifestyle. We then collected data
on diet composition, healthcare spending, and human development for 147 countries. We entered the
data and played with the regression model until we found our final model in which predicted deaths from
diabetes/cardiovascular disease is the dependent variable and the independent variables are: calories from
protein, calories from carbs, daily alcohol consumption per capita, daily sugar and sweetener consumption
per capita, daily fruit consumption per capita, daily vegetable oil consumption per capita, yearly
healthcare spending per capita, the human development index, and regional dummy variables. From this
we found a few variables that were extremely important in regards to preventing deaths.
The human development index was extremely significant and had a coefficient of -782.983,
meaning the more developed a country was, the less amount of deaths from these diseases they would
have. This is most likely because undeveloped countries lack the proper medical advancements and
technologies to detect and treat these diseases. But these technologies can be counter balanced by rapid
economic development, urbanization, and the sedentary lifestyles changes that come with developing and
developed countries. This is why our dummy variables; North America/Europe and
West/Central/North/East Asia, have an increasing effect on deaths from these diseases and our dummy
variables; South Asia/South-East Asia/Oceania and South/Central America have a decreasing effect on
deaths from said diseases. Citizens of these two more-developed regions also have more disposable
income to spend, more home entertainment (which increases sedentary lifestyles), and more unhealthy
food choices (fast-food, rich foods, and processed foods). Alcohol consumption, calories from
carbohydrates, and sugar consumption also have an increasing effect on deaths from
diabetes/cardiovascular disease. Foods that decrease the amount of deaths from these diseases, on the
other hand, are; calories from proteins, fruits, and vegetable oils. The effects of these foods are pretty
common knowledge and has been thoroughly studied. Our findings are similar to the two studies we
listed earlier: 1) Abstain from alcohol or drink it in moderation, 2) Eat healthier foods (more proteins,
fruits, and vegetables/vegetables oils, 3) Abstain from refined carbohydrates and sugars/sweeteners.
There were a few variables that we would liked to have included, but we were not able to find the
data on. In order to help explain the rest of the variation within our model, we could add variables such
as salt consumption, fish consumption, and smoking (tobacco) into the model. Our research showed these
to have an effect to the disease rates we were modeling, and they would likely have increased the
explanatory power of our model
Figure 1-1: Our initial model’s regression output.
Figure 1-2: Model after including the dummy variables.
Figure 1-3 Our Final Regression results
Figure 2-1 Residual plot of healthcare spending vs diabetes and cardiovascular diseases rates
Figure 2-2: Residual plot of healthcare spending vs square root of diabetes and cardiovascular disease rates
Figure 2-3: Residual plot of healthcare spending vs the natural log of diabetes and cardiovascular disease rates
Figure 2-4: Alcohol consumption vs diabetes and cardiovascular disease rates
Work Cited
Hu, Frank B. "Globalization of Diabetes." Globalization of Diabetes. American Diabetes Association, June-July 2011. Web. 13 Dec. 2015. Kromhout, Daan, Alessandro Menotti, Hugo Kesteloot, and Susana Sans. "Prevention of Coronary Heart Disease by Diet and Lifestyle." Circulation. American Heart Association, Fall 2002. Web. 13 Dec. 2015.
Ramachandran, Ambady; Ma, Ronald Ching Wan; Snehalatha, Chamukuttan. The Lancet375.9712 (Jan 30-Feb 5, 2010): 408-18. References (138)
ChartsBin.com - Visualize Your Data. ChartsBin.com, n.d. Web. 4 Dec. 2015.
World Health Organization. WHO, n.d. Web. 4 Dec. 2015.
World Bank Group. The World Bank Group, n.d. Web. 4 Dec. 2015.