economic forecasting - semester project

Diet and Development

Linked to Cardiovascular

Disease and Diabetes

Nicholas Rothrauff and Sam Lawson

Decemeber 2015

Economic Forecasting

Abstract

As countries become more developed, there is less need to worry about where our next meal will come

from. This translates into less attention being paid to what it is that we consume throughout our daily

lives. A result of this has been for the composition of our diets to shift in a way that may not be

optimal. We set out to find varying dietary components of a large number of different countries across

the globe. We hope to find significant correlations between these dietary components and one of the most

devastating categories of disease: diabetes and cardiovascular disease. There is a growing epidemic of

these diseases occurring in the developed and developing world, and we hope to gain some further insight

into the factors that are contributing to their rise. We believe they can largely be explained by the

consumption habits within each country.

Introduction

Within the past couple centuries, there has been a dramatic shift around the globe toward

industrialization and urbanization. While this shift has led to an increase in production capacity and

efficiency, it has also altered our lifestyles in many ways. We have adopted a diet that is higher in

calories, yet lower in nutrition. At the same time, technological advances have allowed us to live a

drastically more sedentary lifestyle: jobs that require sitting for 8 hours per day, entertainment

technologies such as television that promote long bouts of sitting, and transportation that now allows us to

travel around the world with near-zero physical exertion. This coupling of less nutritive food and a more

inactive lifestyle is linked to the rise in obesity and the non-communicable diseases that coincide with it.

These diseases make up for a large portion of our country’s GDP every year through our

spending on healthcare. Also, animal husbandry has become one of the leading industries in terms of

pollution. Whether it’s through the methane emissions from the animals or through fertilization and the

use of pesticides, the effects they have on our planet are apparent. As more countries begin to move from

“undeveloped” to “developed” nations, these effects are only going to increase. One possible solution

would be to adopt a diet that is more conducive to a healthy lifestyle.

We posit that there will be a link in the composition of a country’s diet and the country’s

diabetes/cardiovascular disease deaths. The independent variables we will use include: calories from

protein, calories from carbohydrates, daily alcohol consumption, average daily sugar and sweetener

consumption, average daily fruit intake, average daily vegetable oil, healthcare spending per capita

(yearly), and the human development index. We also used dummy variables to describe location where;

Dsub1 equals west, central, north, and east Asia, Dsub2 equals south Asia, southeast Asia, and oceania

Asia, Dsub3 equals North America and Europe, Dsub4 equals South America and Central America, and

when all of these equal zero then the region will be the Carribean and Africa. We will run a multiple

linear regression with diabetes and cardiovascular disease being our dependent variables.

II. Review of related studies

The two studies that were most related to ours were “Globalization of Diabetes” by Frank Hu, and

“Prevention of Coronary Heart Disease by Diet and Lifestyle” by Daan Kromhout, Alessandro Menotti,

Hugo Kesteloot, and Susana Sans. Both of these studies covered the relation of risks of disease, one being

diabetes and the other being heart disease, to diet composition. Frank Hu’s study focused more on the

effects that occurred in certain regions.

Frank Hu’s study found that diabetes was now a global crisis that threatened all nations. Specifically,

he noted that developing nations face the largest issues. Similar to our view, Hu found that diabetes was

“driven by rapid urbanization, nutrition transition, and an increase in sedentary lifestyles.” He also found

that Asia is the region most affected by diabetes. This is because of Asia’s large population and the rapid

economic development that they have experienced. Asia accounts for 60% of the diabetic population in

the world. In his study, Hu found that in 1980 less than 1% of Chinese adults had contracted diabetes, but

by 2008 nearly 10% of all Chinese adults had the disease. Hu’s factors for contracting diabetes are:

smoking (especially in excess), drinking (especially in excess), daily intake of refined carbs (the more you

intake, the more likely you are to contract type 2 diabetes), and physical activity. Hu’s research is

supported by Ramachandran, et. al. and his research into diabetes rates in Asia. Hu’s evidence shows the

fact that many type 2 diabetes instances can be prevented by a healthy diet and daily excercise. He does

make the point that these lifestyle choices require a shift in our “food, built, and social environment.”

The other study, “Prevention of Coronary Heart Disease by Diet and Lifestyle” had similar implication

as Hu’s, but it was instead for heart disease. They found that the most important factors were diet,

smoking, alcohol consumption, and physical exercise. One of the differences in our study and this study

was that they included salt and fish consumption. This study also puts a large emphasis on the amount of

trans fat in people’s diets. The recommendations that the authors make are: Don’t smoke, drink in

moderation if at all, be somewhat active, eat more than 400 grams of vegetables and fruits a day, limit salt

consumption, and eat fish once a week.

III.Hypothesis

We have created our hypothesis based on intuitive beliefs about diet composition and its relation to

diabetes/cardiovascular disease, and the studies we looked into; “Prevention of Coronary Heart Disease

by Diet and Lifestyle” (Kromhout) and “Globalization of Diabetes” (Hu). We decided to test Diabetes

and Cardiovascular disease compared to regions of the world and their respective diet composition. We

wanted to see the deaths from these diseases compared to diet and the development of a country. Our

independent variables we used were: calories from protein, calories from carbohydrates, daily alcohol

consumption, average daily sugars and sweeteners, average daily fruit intake, average daily vegetable oil,

healthcare spending per capita (yearly), and the human development index.

We assume first that a healthier, more plant-based diet will lead to a decrease in deaths from diabetes

or a cardiovascular disease. This is because these diets are much lower in processed sugars and trans

fat. Secondly, we assume that citizens of developing nations or already developed nations will have a

higher risk of dying from diabetes or heart disease. This is because of the transition to less nutritious

foods that developing countries experience and also because of an increase in jobs that require long hours

of sitting. It should also be noted that television and the internet play a role here. Thirdly, though

developing nations promote diets and lifestyles that cause these health complications, they also have more

advanced medical technology. These medical advances reduce the risk of death from one of these

diseases since doctors can catch the symptoms early and emplace preventative measures to stop or slow

the disease. We assume that higher healthcare spending will reduce the risk citizens have of dying from

diabetes and cardiovascular disease. We hypothesize:

RE.1: Higher healthcare spending, average daily fruit intake, and calories from protein will

decrease the risk for dying from diabetes/cardiovascular disease.

RE.2: Higher daily alcohol consumption and daily sugar/sweetener intake will increase the risk

for dying from diabetes/cardiovascular disease.

RE.3: The more developed a country is, the more risk there is for dying from

diabetes/cardiovascular disease.

RE.4: In accord with both Hu’s and Ramachandran et. al.’s study, we suspect that regions in Asia

will have increased risk for developing diabetes/cardiovascular disease.

IV.Sample Description

The data we have compiled was from a few different online databases. These include Chartsbin.com,

WHO.int, Worldbank.org, and FAOstat.com. Carb calories, protein calories, and fat calories were in

units of calories per day. healthcare spending was in units of current U.S. dollars per year. The Human

Development Index was on a scale from 0 - 1, with the higher values indicating greater human

development. The remaining variables (alcohol, sugars and sweeteners, fruits, vegetables, vegetable oils,

and meat consumption) were in units of grams/capita/day. Our dependent variable, diabetes and

cardiovascular disease deaths, was the death rate from these diseases out of 100,000 people. One

interesting thing to note was that the observations for the deaths due to these diseases were from 2008,

while the independent variables are from 2006. While we could not find the observations we were

looking for all in the same year, we think that the later dependent variable would be better predicted by

diet composition in the years before. This is due to the fact that diabetes is not a disease that is contracted

suddenly, but rather results after continuous years of unhealthy habits. The mean, minimum value,

maximum value, range, and standard deviation of our variables are represented in Figure 3-1.

V.Regression Methodology

We started out the model using variables that were far too correlated to each other. These were the

percents of the per capita caloric intake that the average diet was made up of and consisted of percent of

diet composed of carbohydrates, percent of diet composed of fats, and percent of diet composed of

protein. Being that they were proportions of the diet (and made up the majority of total diet), one of the

variables could be easily explained out by the other two. This led us to then search for other variables

that would help to explain our data more precisely. The variables that we found covered a wide array of

more-specific parts of a country’s diet, as well others such as the per capita healthcare spending and the

human development index. Also, we took the proportions of the diet and multiplied them by the total

average calories consumed in each country, and included them within the model. We hoped for this to

help correct the original collinearity problems that we had with them in the previous model. The range of

the average calories per day of the countries was 1920, and we thought that the actual caloric intake (not

percentage) due to the variables would help to represent their impact on diabetes and cardiovascular

disease rates more accurately.

Here, we want to note that our methodology looked more akin to “guess and check.” We started with a

few variables that we thought would be significant, and did not receive the results we had hoped for. This

led us to search for an increasing number of independent variables to help explain some of the data that

we had, with each one helping to fill in some of the missing information. Ultimately, we settled on our

final model (which is represented in figure 1-3) due to the significance of all the independent variables

and what we thought was a very high adj. R2 for what it was we were modeling.

While what follows is not quite the process that we originally followed (due to our adding

variables in roughly one at a time and then retesting), it accurately represents what we did in our attempt

to illustrate the process of including nearly all of the variables we gathered for a “full” model. From

there, we used the skills we acquired in class to help us achieve a model that was the best linear unbiased

estimate.

After our original regression, we found that some of our variables were still very correlated to

each other. These included GDP, which was correlated to the human development index and healthcare

spending, as well as the variable for fat calories, which we thought was correlated to meat consumption

and protein calories. Here is a model that we use:

Figure 1-1 shows the regression results we ended up with.

The significance of some of the variables was not in line with what we had originally thought (the

significance of sugars and sweeteners for example), so we knew that we were missing data that would

help to explain this. In acquiring data on diabetes and cardiovascular disease rates, we noticed that the

concentrations of the diseases were focused within specific regions. This led us to include the dummy

variables for around 10 separate regions. Because many of the regions that we included contained only a

couple observations, their prediction coefficients were not very meaningful. We then decided to narrow

down the regions to 5. We did this on the basis of location and the values of the coefficients that the

model predicted (which was in line with the locations). Here is a the model that we used:

The results of the regression are represented in Figure 1-2.

We then decided to take out meat consumption and vegetables based on the significance level of

their t-stats. We believed meat consumption was insignificant due to the fact that it was partially

explained by other variables as well as the variety of meats that are consumed depending on the

region. We thought that a large portion of this variety and the effect it has on diabetes and cardiovascular

disease was being better explained within our dummy variables. We thought that a similar explanation

holds for the low significance that we found for vegetables: those vegetables that would cause increased

rates of the diseases would be partially explained by sugars, and those that would decrease the disease

rates were being explained by vegetable oils. We also thought that these were being better explained in

accord with our regional dummy variables. To reassure that we should take these variables out of the

model, we performed a partial f-test to test if the coefficients for meat consumption and vegetables were

significant. With an f-stat of .1186, we concluded that we would accept the null hypothesis

(𝜷1 = 𝜷7 = 0) at a significance level of 0.1%. 𝜷1 and 𝜷7 proved to be statistically insignificant.

Here is a representation of our final population model:

Our final model’s regression results are represented in Figure 1-3. The following is our prediction

equation for diabetes and cardiovascular disease deaths based on our independent variables’ coefficients:

We then noticed that there looked to be a violation in the assumption of heteroscedasticity due to

the cone-shape of the residual plots of healthcare spending and alcohol consumption (Figures 2-1 and 2-

4). We tried to alter the dependent variable by using the square root as well as the natural log (Figures 2-

2 and 2-2). The violation seemed to remain. After further consideration, we are convinced that this is not

due to the range of residual values changing based on the x value, but rather due to a larger number of

observations in the lower range of these independent variables. We came to this conclusion because the

deviation of residual values seems to remain fairly consistent across the independent variable’s values,

and it was only the concentration of plotted points that began to thin out due to less observations at the

higher end of each independent variable (alcohol and healthcare spending).

VI.Test Results

Our test results were very much in line with our hypothesis. Carbohydrate calories, alcohol, and

sugar/sweetener consumption were shown to be positively correlated to deaths from diabetes and

cardiovascular disease. The positive correlation between carbohydrate calories and alcohol is caused

because of their effect on blood glucose levels and insulin levels. Also, if alcohol consumption is high, it

can be assumed that many citizens from these regions are not taking good care of their bodies or

exercising much. Per gram of consumption, sugars and sweeteners was shown to have the largest effect

on disease, which was expected due to the direct relationship between sugars and diabetes.

The variables that were negatively correlated to deaths from these diseases were as expected: protein

calories, fruits, vegetable oils, healthcare, and the human development index. Calories from proteins may

have a negative correlation because foods high in protein are usually fairly healthy. Also, seeing as how

the heart is another muscle, protein helps make it stronger. Fruits and vegetable oils have this effect

because they are healthy foods low in fat and processed sugar. Being that the dependent variable was

deaths due to these diseases, and not just disease rates, we expected the variables of healthcare spending

and the human development index to be extremely significant. This was confirmed in the regression

model that showed these individual variables to have a p-value of less than .001.

Our dummy variables were also in accord with what we had expected. The dummy for the relatively

developed nations of North America and Europe was shown to be correlated with deaths from these

diseases, but the coefficient is not as much as the developing countries of Asia (excluding south and

southeast Asia). This is because the developing countries of Asia have had rapid urbanization and a

transition in diet that Frank Hu talks about in his article. According Hu’s study, this is also partially

explained due to genetics of people in certain regions. South and south-east Asia has a negative

coefficient most likely because they have more fish in their diet, since these are coastal regions, and don’t

lead such sedentary lifestyles. As our related studies stated, fish has a negative effect on deaths from

diabetes/cardiovascular disease.

VII.Conclusion Our study was based around the correlation between deaths from diabetes/cardiovascular disease and

national diet composition and the level of the country’s development. We looked into some related

studies done by Frank Hu and a compilation of other authors. These studies showed that there was major

correlation between diabetes/cardiovascular disease and diet, region, and lifestyle. We then collected data

on diet composition, healthcare spending, and human development for 147 countries. We entered the

data and played with the regression model until we found our final model in which predicted deaths from

diabetes/cardiovascular disease is the dependent variable and the independent variables are: calories from

protein, calories from carbs, daily alcohol consumption per capita, daily sugar and sweetener consumption

per capita, daily fruit consumption per capita, daily vegetable oil consumption per capita, yearly

healthcare spending per capita, the human development index, and regional dummy variables. From this

we found a few variables that were extremely important in regards to preventing deaths.

The human development index was extremely significant and had a coefficient of -782.983,

meaning the more developed a country was, the less amount of deaths from these diseases they would

have. This is most likely because undeveloped countries lack the proper medical advancements and

technologies to detect and treat these diseases. But these technologies can be counter balanced by rapid

economic development, urbanization, and the sedentary lifestyles changes that come with developing and

developed countries. This is why our dummy variables; North America/Europe and

West/Central/North/East Asia, have an increasing effect on deaths from these diseases and our dummy

variables; South Asia/South-East Asia/Oceania and South/Central America have a decreasing effect on

deaths from said diseases. Citizens of these two more-developed regions also have more disposable

income to spend, more home entertainment (which increases sedentary lifestyles), and more unhealthy

food choices (fast-food, rich foods, and processed foods). Alcohol consumption, calories from

carbohydrates, and sugar consumption also have an increasing effect on deaths from

diabetes/cardiovascular disease. Foods that decrease the amount of deaths from these diseases, on the

other hand, are; calories from proteins, fruits, and vegetable oils. The effects of these foods are pretty

common knowledge and has been thoroughly studied. Our findings are similar to the two studies we

listed earlier: 1) Abstain from alcohol or drink it in moderation, 2) Eat healthier foods (more proteins,

fruits, and vegetables/vegetables oils, 3) Abstain from refined carbohydrates and sugars/sweeteners.

There were a few variables that we would liked to have included, but we were not able to find the

data on. In order to help explain the rest of the variation within our model, we could add variables such

as salt consumption, fish consumption, and smoking (tobacco) into the model. Our research showed these

to have an effect to the disease rates we were modeling, and they would likely have increased the

explanatory power of our model

Figure 1-1: Our initial model’s regression output.

Figure 1-2: Model after including the dummy variables.

Figure 1-3 Our Final Regression results

Figure 2-1 Residual plot of healthcare spending vs diabetes and cardiovascular diseases rates

Figure 2-2: Residual plot of healthcare spending vs square root of diabetes and cardiovascular disease rates

Figure 2-3: Residual plot of healthcare spending vs the natural log of diabetes and cardiovascular disease rates

Figure 2-4: Alcohol consumption vs diabetes and cardiovascular disease rates

Work Cited

Hu, Frank B. "Globalization of Diabetes." Globalization of Diabetes. American Diabetes Association, June-July 2011. Web. 13 Dec. 2015. Kromhout, Daan, Alessandro Menotti, Hugo Kesteloot, and Susana Sans. "Prevention of Coronary Heart Disease by Diet and Lifestyle." Circulation. American Heart Association, Fall 2002. Web. 13 Dec. 2015.

Ramachandran, Ambady; Ma, Ronald Ching Wan; Snehalatha, Chamukuttan. The Lancet375.9712 (Jan 30-Feb 5, 2010): 408-18. References (138)

ChartsBin.com - Visualize Your Data. ChartsBin.com, n.d. Web. 4 Dec. 2015.

World Health Organization. WHO, n.d. Web. 4 Dec. 2015.

World Bank Group. The World Bank Group, n.d. Web. 4 Dec. 2015.

http://search.proquest.com/citedreferences/MSTAR_199057942?accountid=14709

economic forecasting - semester project

Documents