united states obesity rates

Upload: emilylauryn

Post on 02-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 United States Obesity Rates

    1/20

    United States Obesity Rates

    Emmy Masangcay

    Economics 284 (Econometrics)

    Professor David Lewis

    May 16, 2013

  • 7/27/2019 United States Obesity Rates

    2/20

    Masangcay 2

    IntroductionThe obesity epidemic in the United States has only increased over recent years. Between 1980and 2000, obesity rates nearly doubled among adults.1 Currently over 60 million adults, or over30% of the American adult population, are obese nationally. Below are four figures that show theincreasing percentage of obesity per state from various time periods from the Centers of Disease

    Control and Prevention.2

    Obesity is a dangerous health condition and medical professionals havebeen consistently acknowledging that the percentage of Americans classified as obese isintolerably high, especially since the United States has maintained the highest obesity rate as acountry in the world. Obesity is defined using body mass index (BMI), a measure of weight toheight.3 Obesity in adults is defined as a BMI greater than or equal to 30.

    Obesity is a topic that is relevant to nearly everyone in America. People who are obese are at ahigh risk for developing Type 2 diabetes, which is now being increasingly diagnosed among

    1Prevalence of Obesity in the United States, 20092010,NCHS Data Brief, Centers for Disease Control andPrevention (CDC),http://www.cdc.gov/nchs/data/databriefs/db82.pdf(accessed 12 April 2013).2Overweight and Obesity, CDC Home, Centers for Disease Control and Prevention (CDC),http://www.cdc.gov/obesity/data/adult.html(accessed 5 April 2013).3 Jennifer Petrelli and Kathleen Wolin, Obesity (Santa Barbara, California: Greenwood Press, 2009), 19.

    Figure A.1Percentage of

    Obesity per State in 2010

    Figure A.2Percentage of

    Obesity per State in 2005

    Figure A.3Percentage of

    Obesity per State in 2000Figure A.4Percentage of

    Obesity per State in 1995

    http://www.cdc.gov/nchs/data/databriefs/db82.pdfhttp://www.cdc.gov/nchs/data/databriefs/db82.pdfhttp://www.cdc.gov/nchs/data/databriefs/db82.pdfhttp://www.cdc.gov/obesity/data/adult.htmlhttp://www.cdc.gov/obesity/data/adult.htmlhttp://www.cdc.gov/obesity/data/adult.htmlhttp://www.cdc.gov/nchs/data/databriefs/db82.pdf
  • 7/27/2019 United States Obesity Rates

    3/20

    Masangcay 3

    young people. If younger people develop not only obesity but Type 2 diabetes as well, they are ata much higher risk of suffering the serious complications of these diseases as adults, such askidney disease, blindness, and amputation.4 Furthermore, obesity related costs place a hugeburden on families affected and the economy. For example, in 2003, the direct health costsattributable to obesity were estimated at $75 billion, but only $52 billion in 1995.5 Treating

    obesity, in addition to the diseases related to it such as diabetes, stroke, and heart disease, isextremely expensive. However, although it is a serious problem present in nearly a third ofAmerica now, its causes are not well understood.

    Public health advocates have argued that eating unhealthy foods, particularly fast foods, causesobesity because of the high calorie content and generously portion sized servings. The number offast food restaurants, in addition to obesity rates, have been rising for quite some time, whichmay suggest a relationship between the trends. The increasing obesity rates and fast foodrestaurant density is shown below.6

    While there are many factors that contribute to obesity, a common intuition that arises acrossmany regards the consumption of unhealthy foods, mainly fast foods. In 1970, Americans spentabout $6 billion on fast food, but in 2000, Americans spent more than $110 billion.7 The biggest

    fast food restaurant chain in the world is McDonalds, whom operated about one thousand

    4Overweight and Obesity, CDC Home, Centers for Disease Control and Prevention (CDC),http://www.cdc.gov/obesity/data/adult.html(accessed 5 April 2013).5 Ibid.6Michael Anderson and David Matsa "Are Restaurants Really Supersizing America?," American Economic Journal,January 2011,http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdf(accessed 1 April2013).7 Eric Schosser,Fast Food Nation (Boston, Massachusetts: Houghton Mifflin Harcourt, 2001), 2.

    Figure A.5Obesity Rates and

    Restaurant Density 1960-2005

    http://www.cdc.gov/obesity/data/adult.htmlhttp://www.cdc.gov/obesity/data/adult.htmlhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://www.cdc.gov/obesity/data/adult.html
  • 7/27/2019 United States Obesity Rates

    4/20

    Masangcay 4

    restaurants in 1968, but in 2000, McDonalds had about twenty-eight thousand restaurantsworldwide and that number had only increased since then.8 With these facts in mind, my studyaims to answer if obesity rates in America are rising particularly because of the number of fastfood restaurants. My study also hopes to capture if a lack in healthier foods eaten, such as fruitsand vegetables, has contributed high obesity rates.

    There has been a plethora of past literature that has been helpful throughout the course of thisproject. While the sample size and setup has greatly differed from my model, the general topic ofresearch has been similar. Among the literature explored, there were two sources that wereparticularly useful. One of which was the The Effect of Fast Food Restaurants on Obesity andWeight Gain by Janet Currie, Stefano Della Vigna, Enrico Moretti, and Vikram Pathania

    published in theAmerican Economic Journal. These authors explored how the changes in thesupply of fast food restaurants affect the weight outcomes of 3 million children and 3 millionpregnant women. Their study found that in first-year high school students, a fast food restaurantwithin 0.1 miles of a school leads to a 5.2% increase in obesity rates.9 With pregnant women, afast-food restaurant within 0.5 miles of residence results in a 1.6 percent increase in the

    probability of gaining over 20 kilos.

    10

    While this study is much more focused on the exactproximity of a fast food restaurant to an individual rather than the general concentration of fastfood restaurants per capita, the question explored was very similar to mine and provided formuch guidance throughout the project. Another useful source was Are Restaurants ReallySupersizing America? by Michael Anderson and David Matsa, published again in theAmericanEconomic Journal. The authors found no causal link between restaurant consumption andobesity, mainly because consumers usually offset calories from fast food meals by eating less atother times.11 As these two studies show, past results regarding this topic has remainedinconsistent. However, it is still apparent that the intuitive thinking among most residents andhealth policy advocates is that the greater availability of restaurants increases the obesity rate.My study explores this concept, in addition to how lack of healthier foods eaten has contributedto obesity rates.

    Conceptual FrameworkWhile it is obvious that fast food is unhealthy due to its lack of nutrients and high calorie intake,it is not clear whether changes in the amount of fast food restaurants per capita will have animpact on health. In one case, more fast food restaurants per capita can raise the convenience ofa family to get food. This can lead people to buy unhealthier foods that are closer to thembecause it would be cheaper to do so due to the reduction of travel and food costs. Furthermore,easier access to fast food could be tempting to consumers with self-control problems or thosewho do not have time in their day to cook a healthier meal. On the other hand, more fast foodrestaurants per capita can lead to a substitution away from unhealthy foods already eaten at home

    8 Ibid.9 Janet Currie, Stefano Della Vigna, Enrico Moretti, and Vikram Pathania, "The Effect of Fast Food Restaurants onObesity and Weight Gain,"American Economic Journal, February 2009,http://www.nber.org/papers/w14721.pdf?new_window=1(accessed 10 March 2013).10 Ibid.11Michael Anderson and David Matsa "Are Restaurants Really Supersizing America?," American EconomicJournal, January 2011,http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdf(accessed1 April 2013).

    http://www.nber.org/papers/w14721.pdf?new_window=1http://www.nber.org/papers/w14721.pdf?new_window=1http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://www.nber.org/papers/w14721.pdf?new_window=1
  • 7/27/2019 United States Obesity Rates

    5/20

    Masangcay 5

    without significantly changing the amount of unhealthy food consumed, thus leaving obesityrates hardly affected. Although both fast food restaurants and obesity have increased over time,as Figure A.5 demonstrates, a relationship is suggested between the two, but is not proven. Anargument can be made in both directions for why the amount of fast food restaurants per capitawould have an impact on the obesity rate or why it would not affect it at all. My intuition before

    the start of this project (and before reading any prior literature) was that more fast foodrestaurants per capita are likely to increase the obesity rate due to the reasoning above. However,I think this is greatly affected by the average per capita income, since one would think thatpeople spend more money on healthier foods if they can afford to (i.e. higher per capita income).

    One piece of information to keep in mind is that fast food managers are most likely to choose toopen new restaurants in the areas that would expect high demand. The higher demand forunhealthy foods is most likely correlated with a higher risk of obesity for that specific area,compared to other areas that do not have any fast food restaurants. This possibility shows thatthere may be unobservable data contributing to obesity that can be correlated with the number offast food restaurants per capita, which would lead to an overestimation of the role of fast food

    restaurants on the obesity rate. Aside from this, there are other factors that have an impact on theobesity rate, but are impossible to include in the model due to the difficultly of quantification orsimply because data was not available on the matter. Two important unobservable data includefamily obesity history and preferences, for example.

    In order to explore if obesity rates are mainly rising due to the growing number of fast foodrestaurants and lack of healthy foods eaten, I chose my independent variable as obesity ratesacross two different years; 2011 and 2007. There is nothing specific about choosing theseparticular years, but the data that I was searching for was available in both of these time periods.There are eight independent variables, all of which are explained below. The most importantvariable in regard to the main question of my project areffr, fast food restaurants per 1,000residents. Thefruvegvariable, the percentage of people who consume the recommended 5+fruits/vegetables amount by the Centers of Disease Control and Prevention (CDC) daily per state,is the second most important variable. These two variables will therefore be the main focus ofthe project.

    Data DescriptionThe variables I used for this project, along with their definition and sources are summarizedbelow.

    Table A: Variables, Definitions, and Sources

    Variable Name Definition Units of Measurement Source

    Y obes Obesity % per state Percentage (e.g. 0.15 means theobesity rate is 15% in aparticular state)

    Centers of DiseaseControl andPrevention (CDC)

    1 pcinc Average per capitaincome per state

    In actual dollars (e.g. 27,000means the average per capitaincome is $27,000)

    United States CensusBureau

    2 age Average age per state In years (e.g. 37 means 37 yearsis the average age in a state)

    United States CensusBureau

  • 7/27/2019 United States Obesity Rates

    6/20

    Masangcay 6

    3 black Black % per state Percentage (e.g. 0.05 means theblack population is 5% in aparticular state)

    United States CensusBureau

    4 ffr # of fast foodrestaurants per 1,000

    residents per state

    Number based on per 1,000residents (e.g. 0.800 means

    there are 0.800 fast foodrestaurants per 1,000 residentsin a state)

    United StatesDepartment of

    Agriculture

    5 gen # of males per 100females per state

    In # of males based per 100females (e.g. 102 means thereare 102 males per 100 femalesin a state)

    United States CensusBureau

    6 leisinact % of leisurelyinactive residents perstate

    Percentage (e.g. 0.15 means15% ofthe states population isleisurely inactive)

    Centers of DiseaseControl andPrevention (CDC)

    7 fruveg % of people who

    consume 5+fruits/vegetablescombined daily perstate

    Percentage (e.g. 0.10 means

    10% of the states populationconsumes 5+ fruits/vegetablesdaily)

    Centers of Disease

    Control andPrevention (CDC)

    The units of observation forn are the 50 states across the United States.

    The sources that I retrieved my data for my 50 observations from, as summarized above, includethe Centers of Disease Control and Prevention (CDC), the United States Census Bureau, and theDepartment of Agriculture. The CDC is a government agency under the United StatesDepartment of Health and Human Services that works to protect public health and safety; so

    much of my information regarding obesity and health statistics comes from here. The UnitedStates Census Bureau is responsible for producing and releasing information about Americanresidents and the economy. The United States Department of Agriculture is responsible foranything regarding food, farming, agriculture, forestry, natural resources, etc. Because all of thedata comes from government agencies, all of the data is as up to date as possible and accurate.

    My main model was originally meant to capture 2011. However, the same data was available for2007. Therefore, I was able to create three different models: 2011, 2007, and a difference model(2007s data subtracted from 2011).

    Based on the variables described above, the following summaries were derived:

    12 The variable black, as defined by the United States Census Bureau, includes those residents who are Black incombination, not just solely.13leisinactencompasses residents who did not report leisure-time physical activities when surveyed, such as anyphysical activities or exercises like running, walking, gardening, golf, volleyball, etc. within the past month.

  • 7/27/2019 United States Obesity Rates

    7/20

    Masangcay 7

    Stata Table A: 2011 Summary

    Stata Table B: 2007 Summary

    Stata Table C: Difference Model (2007 subtracted from 2011) Summary

    As seen above, there was a big increase inffrbut also a sizable decrease infruvegbetween 2007and 2011. The obesity rate was also higher on average in 2011.

    Although this data is not particularly helpful in answering the question at hand, it may beinteresting to the reader to know which states are associated with the minimum and maximumvalues for each of the variables. Therefore, Table B below summarizes that information.

    Table B: Minimum and Maximum Values with State Specifications

    Variable Minimum 2011 Minimum 2007 Maximum 2011 Maximum 2007

    obes Colorado Colorado Mississippi Mississippipcinc Mississippi Mississippi Connecticut Connecticut

    age Utah Utah Maine Maine

    black Montana Montana Mississippi Mississippi

    ffr Utah Wisconsin Vermont Mississippi

    gen Rhode Island Mississippi Alaska Alaska

    leisinact Minnesota Minnesota Mississippi Mississippi

    fruveg Mississippi Oklahoma Vermont Vermont

  • 7/27/2019 United States Obesity Rates

    8/20

    Masangcay 8

    I also added regional dummy variables to my model because obesity rates have typically beenfound to be higher in the Southern region of the United States compared to the West, Northeast,and Midwest. The dummy variables were added to estimate the ceteris paribus differencebetween the regional groups regarding the obesity rate. The dummy variables are describedbelow.

    Table C: Dummy Variables

    Dummy Variable States

    South (1 = state is in the South, 0 if not) Florida, Georgia, Maryland, NorthCarolina, South Carolina, Virginia, WestVirginia, Delaware, Alabama, Kentucky,Mississippi, Tennessee, Arkansas,Louisiana, Oklahoma, and Texas

    West(1 = state is in the West, 0 if not) Alaska, Arizona, California, Colorado,Hawaii, Idaho, Montana, Nevada, NewMexico, Oregon, Utah, Washington, and

    WyomingMidwest (mw) (1 = state is in the Midwest, 0 ifnot)

    Illinois, Indiana, Iowa, Kansas, Michigan,Minnesota, Missouri, Nebraska, NorthDakota, Ohio, South Dakota, Wisconsin

    Northeast (ne) (1 = state is in the Northeast, 0 ifnot)

    Connecticut, Maine, Massachusetts, NewHampshire, Rhode Island, Vermont, NewJersey, New York, and Pennsylvania

    Econometric Model/Estimation MethodsMy regression model takes the following form:

    y = 0 + 1x1 +2x2 +3x3 + 4x4 +5x5 +6x6 + 7x7 + u

    With the specific variables I used, the regression took the following form:obes= 0 +1pcinc+2age+ 3black+4ffr+5gen+ 6leisinact+7fruveg+ u

    However, in different analyses conducted later on, I added regional dummy variables,interactions and squared terms. The model above is the most basic model used in my project andapplies to the 2011, 2007, and difference model.

    I kept my model as a level-level model. This specification means that y = 1x, so if x werechanged by 1, y would change accordingly by 1. I choose this specificationbecause, asexplained later, the difference model is my primary model of interest and focus throughout this

    paper. Because this model has negative numbers, using logs would not have been possible.Therefore, I maintained a level-level model for my regressions.

    Throughout this project, I was able to add quadratics and interaction terms for further analysis ofmy main question. The most important variable in my model in regard to my question isffr. Thesecond most important variable isfruveg, since it captures the percentage of people eating thehealthy, recommended amount of fruits and vegetables daily. Because of this, I later squaredthese terms to analyze the decreasing or increasing marginal effects on obesity rates thatffrand

  • 7/27/2019 United States Obesity Rates

    9/20

    Masangcay 9

    fruvegcause. As for interactions, I interactedffrandpcinc to focus on the partial effect of percapita income on fast food restaurants per 1,000 residents on the obesity rate. This allows me tosee if the effect of fast food restaurants on obesity differs in a poorer area versus a richer area.For thefruvegvariable, I interacted it with age to analyze if the effect offruvegon obesitydiffers with younger or older average ages across the United States.

    Some of the challenges that have come up in the project are due to the multiple linear regression(MLR) assumptions and are talked about below.

    Assumption MLR.3 (No Perfect Collinearity)in the sample and population, none of theindependent variables are constant and there is no exact linear relationship among theindependent variables. This assumption allows independent variables to be correlated, but thevariables just cannot be perfectly correlated. A very simple and common way that twoindependent variables can be perfectly correlated with each other is when one variable is aconstant multiple with another. Although this has not come up as an issue in my model, if it did,I would be able to account for perfect collinearity by dropping a variable.

    Assumption MLR.4 (Zero Conditional Mean)the erroru has an expected value of zero givenany values of the independent variables. In other words, E(u|x1,x2 xk) = 0. This assumption hasproved to be a challenge for my project mainly because there have been omitted factorscorrelated withx1,x2,x3, xk, causing Assumption MLR.4 to be violated. This is due to datalimitations or unquantifiable data that has become impossible to include. Examples of omittedvariables that I have not been able to include are genetic obesity history (e.g. how many previousfamily members had obesity prior to an individual becoming obese?), preferences, percentage ofincome spent toward fast foods, gym membership numbers, average hours spent working outweekly, etc. While I have dedicated much time trying to find information on these factors, muchof this data has yet to be researched (or if already researched, not published). There is definitelycorrelation between some of these factors and variables included in my regression. For example,preferences of what types of foods to eat and the percentage of income spent toward fast foodsare most likely partially dependent on ones income. Although I have tried to find as muchquantifiable data possible, Assumption MLR.4 has been violated during my project.

    Assumption MLR.5 (Homoskedasticity) the erroru has the same variance given any values ofthe explanatory variables. In other words, var(u|x1,xk) =

    2. This assumption means that thevariance in the error term, u, conditional on the explanatory variables, is the same for allcombinations of outcomes of the explanatory variables. If this fails, then the model used wouldexhibit heteroskedasticity, which is when the variance ofu, given the explanatory variables, isnot constant. Testing for this in the model is ideal because if MLR.5 fails, then the usual t-statistics and f-statistics are not t-distributed, which means that any hypothesis testing conductedwould not be valid.

  • 7/27/2019 United States Obesity Rates

    10/20

    Masangcay 10

    Stata Table D: Heteroskedasticity Test on the Difference Model

    Above is the test for heteroskedasticity for the difference model (since this is my main model, asexplained later) without dummy variables and squared terms. Stata calculates the f-statistic,

    which is 1.32. The 5% critical value is 2.34, so because the f-statistic is less than the criticalvalue, we fail to reject the null hypothesis of homoscedasticity at the 5% significance level. Thismeans that there is no evidence that MLR.5 is violated, showing that the standard errors in themodel, along with any t-tests conducted, are correct. Therefore, no corrections need to takeplace. Any t-tests conducted in the Results section will be accurate.

    Because there is no time series involved in this project, there is no issue of serial correlation orother time series assumptions.

    ResultsParameter EstimatesBelow are the parameter estimates for my three different models.

    Table D: Model Coefficients (with the unexpected signed coefficients in bold)

    Model constant pcinc age black ffr gen leisinact fruveg

    2011 19.7763 -0.0001 0.2431 0.0736 0.0285 0.0161 0.2875 -0.4362

    2007 13.7836 -0.0001 0.1688 0.0983 -4.8077 0.1101 0.2115 -0.1189

    20112007

    2.3667 -0.00004 0.0301 -0.1283 0.6963 0.1776 0.0768 0.0293

    For the most part, the coefficients were signed in the ways I expected them to be, aside fromthree coefficients specified in bold. For the 2007 model, the parameter onffrwas estimated to benegative, which says that an increase of 1 fast food restaurant per 1,000 residents will decreasethe obesity rate by 4.81%. However, as the mean offfris only 0.1189 (Stata Table C), it isunlikely that there will ever be an increase of 1 inffr. It would be more appropriate to say, forexample, that an increase inffrby 0.1 would merit a decrease in the obesity rate by 0.48077%.Regardless, it was still unexpected to see a negative sign in front offfr. Furthermore, with the2011 and 20012007 model, the estimate onffris positive, but its effect is very minimal. Ifffrwere to increase by 0.1 in 2011, the corresponding change in obesity rate would be an increase of

  • 7/27/2019 United States Obesity Rates

    11/20

    Masangcay 11

    only 0.00285%. However, the coefficient is much larger in the difference model. The estimatesonfruvegare small as well. Becausefruvegmeasures the percentage of people who consume 5+fruits/vegetables combined daily per state, its coefficient is a proportion, so any inputs must bebetween 0-1. The number 1, for example, would represent a 100% increase in the amount ofpeople who consume the recommended fruit and vegetable intake daily, whereas an input of 0.05

    would represent a 5% increase. The 2011 model has the largest estimate of thefruvegcoefficientand in that case, if there were a 5% increase in the amount of residents who consumes 5+ fruitsand vegetables daily, then the obesity rate would correspondingly decrease by 0.02181%. Ianticipated the co-efficient to be negative in all cases, but the difference model shows otherwise.It should be expected, based on health studies, doctor recommendations, and common intuition,that eating healthier (certeris paribus) will instead reduce the obesity rate. This unexpectedfinding was not only seen infruveg, since the variable blackwas also negative. This was alsounexpected because typically obesity rates are much higher in minority populations, especiallythose identified as black.

    It could be that the coefficients carry the unanticipated signs due to omitted variable bias. There

    are several immeasurable factors that are most likely correlated withffrandfruveg. For example,dietary preferences are probably related with all of these factors. Preferences regarding food candictate whether fast food restaurant chains will build a fast food restaurant in that area andwhether one will eat more or less fruits and vegetables. However, because preferences have beenexcluded from this model, some of the coefficients may not be as accurate as they can be,possibly including the parameters with the unanticipated signs. Other omitted variable factorsinclude family obesity history, time spent working out, ability to cook, etc. To limit some of thisbias, it would be most appropriate to use the difference model for the remainder of the project.Although, it is important to keep in mind that while the difference model is ideal, theunanticipated negative coefficient associated with the blackandfruvegcoefficients can beoverlooked to an extent because as explained later, all of these variables have insignificant p-values.

    Explaining the Difference ModelWith the 2007 regression in the previous section, the estimated equation causally implies that anincrease inffrlowers the obesity rate. Although this is possible, this is not the expected case andthis regression most likely suffers from omitted variable problems. To account for the omittedvariables, I could have tried to control for more factors, but because many of these factors werehard to find appropriate data on or quantify, I was unable to do so. Therefore, an alternativemethod is to use a difference model. This means to view the unobserved factors that affect obesas two different types: those that are constant over 2011 and 2007 and those that vary over time.This can be expressed in the model below:

    obesit= 0 +1pcincit +2ageit + 3blackit +4ffrit +5genit + 6leisinactit +7fruvegit + [ai + uit]

    Above, i represents the observations while trepresents the time period. The terms in the brackets,ai + uit, represent the unobserved, where ai captures all of the unobserved constant over timefactors that affect obes and uit is the time-varying error, as it represents the unobserved variablesthat change over time that affect obes. The variable ai is typically called the unobserved effect, orfixed effect, while the variable uit is typically referred to as the idiosyncratic error. When I use

  • 7/27/2019 United States Obesity Rates

    12/20

    Masangcay 12

    the difference model through subtracting 2007s data from 2011s, I am able to eliminate some

    omitted variable bias in my regression. This is demonstrated below:

    obes2011-2007= 0 +1[pcinc2011 -pcinc2007]+2[age2011 - age2007] + 3[black2011 - black2007]+4[ffr2011 -ffr2007]+5[gen2011 -gen2007]+ 6[leisinact2011 - leisinact2007]+7[fruveg2011 -

    fruveg2007] + [ai -ai]+ [u2011 - u2007]

    Eventually, this leads to:obes2011-2007= 0 +1[pcinc2011 -pcinc2007]+2[age2011 - age2007] + 3[black2011 - black2007]+

    4[ffr2011 -ffr2007]+5[gen2011 -gen2007]+ 6[leisinact2011 - leisinact2007]+7[fruveg2011 -fruveg2007] + [u2011 - u2007]

    The term aicancels out because it does not change over time. The difference model allows me to

    account for at least some of the unobservable data that has affectedffr. Therefore, it is no longerneeded to assume thatffris uncorrelated with ai because those unobservables are not in themodel anymore. It is important to realize that while the difference model addresses a decent

    amount of omitted variables, it does not account for all. The table below shows some (not all)examples of omitted variables that are included in ai (and therefore, are accounted for) and uit(still unaccounted for). Included in ai are also some variables that have not been exactly constantover time, but roughly constant.

    Table E: Examples of Unobservable Data

    Included in ai Included in uit

    Climate, education, statewide standardizedprocesses for making fast foods, mountains andparks per state, state perceptions towardobesity, etc.

    Gym membership, amount of time spentworking out, family obesity history, changes intechnology, preferences etc.

    Because the difference model accounts for and eliminates quite a number of unobservable data,and is thus more accurate than the two other models, the differenced model is the model that Iwill be using for the remainder of the project. The 2007 and 2011 individual models are nolonger needed.

    The equation for the difference model, excluding dummy variables, squared terms, andinteractions is:

    obes = 2.36670.0004pcinc+0.0301age+0.6963ffr0.1283black+0.1776gen+0.0768leisinact+0.0293fruveg

  • 7/27/2019 United States Obesity Rates

    13/20

    Masangcay 13

    T-Statistics and P-ValuesThe following table shows all t-statistics and p-values for the differenced model.

    Stata Table E: Regression with T-Statistics and P-Values

    The t-statistics are the statistics used to test against any alternative. The rejection rule is that H0 is

    rejected in favor of H1 at the 5% significance level if t-statistic > critical value. We can test thenull hypothesis of a linear relationship betweenffrand obes.H0: 4 = 0H1: 4 0For this test, the respective critical value is 1.684 and the t-statistic is 0.43. Because the criticalvalue is greater than the t-statistic, we fail to reject the null of a linear relationship betweenffrand obes.

    Likewise, we could run a similar test forfruvegwe can test the null hypothesis of a linearrelationship betweenfruvegand obes.H0: 7 = 0

    H1: 7 0The critical value would be the same as our last test, 1.684, but our respective t-statistic is 0.27.Similar to our last test, because the critical value is greater than the t-statistic, we fail to reject thenull of a linear relationship betweenfruvegand obes.

    The p-value summarizes the strength or weakness of empirical evidence against the nullhypothesis. The p-value is the probability of observing a t-statistic if the null hypothesis is true.Therefore, small p-values are evidence against the null hypothesis, while large p-values areevidence for the null hypothesis. As seen from above, none of the variables are statisticallysignificant at any reasonable testing level. These p-values essentially say that the variables haveno significant effect on the obesity rate. When the regional dummy variables are added to the

    model, with the South as the reference group, it is still the case that none of these variables aresignificant at any reasonable testing level.

    F-Tests and Joint SignificanceJoint insignificance appears when the null hypothesis is not rejected and often justifies droppingcertain variables from a model. Therefore, an F-Test can be utilized to determine whether certainvariables are worth keeping in a model. Using the difference model gives us the following

  • 7/27/2019 United States Obesity Rates

    14/20

    Masangcay 14

    information. When all of the independent variables are kept in the model, R2 is 0.0672. Whenffrandfruvegare dropped, R2 is 0.0618. The F-statistic can be calculated from here:

    F = ((R2URR2

    R)/q)/((1R2

    UR)/(nk1))F = (0.06720.0618)/2))/((10.0672)/(5071)

    F = (0.0054/2))/((0.9328)/(42)F = (0.0027)/(0.0222095238095238)F = 0.12157

    The F-statistic is very small and thus we fail to reject H0 in favor of H1 at any significance level.This means that the variablesffrandfruvegare jointly insignificant, so dropping these variablesfrom the model would be justified. However, the insignificance of these variables has to do moreabout precision in estimation and not necessarily causality, so we do not necessarily have to dropthese two variables. Dropping these variables would then include them in the error term, whichcould possibly have some bias effects with the remaining independent variables and would be abig issue to deal with. Just becauseffrandfruvegare deemed insignificant, it does not mean that

    the tests regarding these variables stop here. Rather, it is important to keep these facts in mindthroughout the continuance of this project.

    Dummy VariablesAlthough using dummy variables does not directly answer the question at hand, utilizing themcan be useful because they can highlight qualitative factors that are of interest to the project. Inthis case, regional dummy factors can be used to highlight differences in the obesity rate for theSouth, Northeast, Midwest, and West. Table C illustrates which states were categorized intowhich regional categories. The model will takes following form:

    obes= 0 +1pcinc+2age+ 3black+4ffr+5gen+ 6leisinact+7fruveg+ 1ne + 2mw +

    3west

    The reference category is the South, meaning that all comparisons are made against this group,because the South typically has the highest obesity rate. After inputting this information in Stata,the following equation is:

    obes = 4.71260.0001pcinc+0.3429age0.3634black+0.4863ffr+0.2378gen+0.2422leisinact+0.1161fruveg0.4337ne0.9549mw2.1843west

    Relative to the reference group, the Northeast is 0.4337% more likely to have a lower obesityrate than the South while the Midwest is 0.9549% more likely. The West has a bigger estimationfor the difference in obesity rates, as it is predicted that the West will have a 2.1843% lowerobesity rate than the South. Since the p-value on ne and mw are 0.579 and 0.165 respectively, wefail to reject the null hypothesis that these areas have the same obesity level at any level.However, the null would not be rejected forwest, since its p-value is 0.001. This illustrates thatthere are indeed regional differences in the obesity rate, but the specific causes for this areunknown. It could be factors such as weather, climate, and the nature of the regions relative tothe South. For example, the West probably has the most ideal climates year round for outdoorexercise (think of states such as California, Colorado, Oregon, etc.), especially in comparison to

  • 7/27/2019 United States Obesity Rates

    15/20

    Masangcay 15

    the very hot and humid states in the South. Furthermore, there may be more mountain and parksin the states of the West than the other areas, so more people can utilize outdoor exercise throughhikes and whatnot in these states versus the other states. There could be other factors as well, butthe prior explanation is just a theory that could perhaps explain the regional differences in theobesity rate, especially in regard to the West vs. South.

    QuadraticsQuadratics are useful for this project because they allow us to see the diminishing increasing ordecreasing marginal effect of a variable on obes. I put a quadratic on the termsffrandfruvegsince those are the most relevant variables to the question at hand.

    Stata Table F: Quadratic Model

    The estimated equation shows that bothffrandfruveghave an increasing marginal effect onobes.

    The turning of a quadratic function can be calculated with the equation x* = | /(2|. Forffr,before the turning point, it has a negative effect on obes but after the turning point,ffrhas apositive effect on obesity. The turning point offfris calculated below:

    ffr= 0.7676178/(2)5.032455 = 0.0762667

    With the model, the results show that the effect on obes thatffrhas is zero whenffris 0.0762667.Whenffr< 0.0762667, there is a negative effect on obes but whenffr> 0.0762667 this meansthat obes increases with respect toffrincreasing. A turning point does not have to be calculatedforfruvegbecause there is no turning point. The quadratic shape offfris u-shaped, since the co-efficient onffris negative while the co-efficient on its squared term is positive. Forfruveg, the

    co-efficient on both the squared and normal term is positive, so the quadratic shape to the curveis always upward sloping. There is never a point in which the data will turnaround and theobesity rate will decrease given increases infruveg. The reason for this is probably due toomitted variable bias. Even though it is possible, it is most likely not the case that as more peopleeat fruits and vegetables, the obesity rate increases. Although the difference model eliminated theconstant error terms, there is still important data unaccounted for that are affecting the model,such as the coefficients onfruvegandfruveg_sq. However, it is interesting to note that the p-values offruvegand its squared term are statistically significant at the 5% level in this model.

  • 7/27/2019 United States Obesity Rates

    16/20

    Masangcay 16

    Therefore, we could reject the null of a linear relationship betweenfruvegandfruveg_sq withobes.

    32 states have anffrvalue of 0.0762667 or greater. We can see that adding moreffrafter0.0762667 has an increasing effect on obes. This is illustrated below:

    obes = {[0.76761878 + (2)5.032455]ffr}ffr (0.76761878 + 10.06491ffr)ffr

    An increase inffrfrom 0.08 to 0.09 increases the obesity rate by 0.76761878 + 10.06491(0.08) =1.5728% whereas the increase from 0.09 to 0.10 increases the obesity rate by 0.76761878 +10.06491(0.10) = 1.77%. This is a somewhat strong increasing marginal effect after the turningpoint of 0.76761878. Because more than half of the states have anffrvalue equivalent to theturning point or higher, the obesity rate is expected to increase with respect to any additional fastfood restaurants built in these areas. Due to the high p-values,ffrandffr_sq are not statisticallydifferent from zero at any reasonable test level, so we fail to reject the null hypothesis of a linearrelationship betweenffrorffr_sq with obes. This supports the findings from prior tests as well.

    InteractionsI interactedffrwithpcinc andfruvegwith age and here are the Stata results:

    Stata Table G: Interaction ffr*pcinc

    In this model, the partial effect offfron obes (holding all other variables fixed) is

    = 4 +

    8pcinc. If 8 > 0, then an additional fast food restaurant per capita merits a higher obesity ratefor those areas with a higher per capita income, meaning that there is an interaction effectbetweenffrandpcinc. At the meanpcinc of 3,604.6 (Stata Table C), the estimated partial effectofffron the obesity rate is 3.248847(0.000704)(3604.6) = 0.7112086. For a state that has the

    average per capita income, an increase offfrby 1 increases the obesity rate by 0.7112%. Withthe maximum income of 12,390, the partial effect is -5.473713 (decreasing effect) and with theminimum of -3,516, the estimated partial effect is 5.724111 (increasing effect). This illustrates adramatic difference the effects of fast food restaurants for various spectrums of per capita incomelevels, since there is a decreasing effect offfron obes for higher levels of income, but anincreasing effect with lower levels of income. Each t-statistic is also insignificant here, so we areunable to reject H0: 4 = 0, 8 = 0.

  • 7/27/2019 United States Obesity Rates

    17/20

    Masangcay 17

    To find the partial effect ofpcinc on the obesity rate, we use 1 + 8ffrin which we have to pick avalue forffr. We can test the null hypothesis thatffrhas no effect on the obesity rate. If we usethepcinc mean, we get: H0: 4+ 83,604.6 = 0 against H1: 4+ 83,604.6 > 0. Let = 4 +83,604.6, so 4 = 83,604.6. When we substitute this into our equation, we get:

    obes= 0 +1pcinc+2age+ 3black+4ffr+5gen+ 6leisinact+7fruveg+ 8ffr*pcinc obes= 0 +1pcinc+2age+ 3black+[83,604.6]ffr+5gen+ 6leisinact+7fruveg+ 8ffr*pcinc

    obes= 0 +1pcinc+2age+ 3black+ffr+5gen+ 6leisinact+7fruveg+ 8ffr[pcinc3,604.6]

    The relevant test would be H0: = 0.

    Stata Table H: Interaction ffr*pcincTest

    The t-statistic onffris the t-statistic used to test the null hypothesis thatffrhas no effect on theobesity rate. The t-statistic is 0.44, which is less than the critical value at the 5% level, so we fail

    to reject the null hypothesis at the 5% level. At the average per capita income,ffrhas astatistically insignificant effect on the obesity rate. As a side note, one could also notice that the

    co-efficient on is also 0.711 which was also the calculated partial effect offfron obes above.

    Another interaction done isfruvegwith age. The partial effect offruvegon obes (holding all

    other variables fixed) is

    = 2+ 8age. If 8 > 0, then increases in the percentage of people

    who eat the recommended amount of fruits and vegetables daily would cause a higher obesityrate for those who are older. The mean age is for the differenced set is -0.7308, so the estimatedpartial effect ofage on the obesity rate (based on the co-efficients; table not shown) is -5.335453(-0.5806202)(-0.7308) = -5.759770242. This shows that when there is an increase infruvegby

    10%, the obesity rate will decrease by 0.5760%. With the minimum average age of -1.81, theestimated partial effect offruvegis -6.386375562 and with the maximum average age of 0.45,the estimated partial effect is -5.07417391. As the results show, there is not much variationregarding the effect offruvegon obes given different age levels. The null hypothesis stating thatfruveghas no effect on the obesity rate can be tested. If we use the mean age, the null andalternative would look like: H0: 280.7308 = 0 against H1: 280.7308 > 0. We can let =280.7308 so that 2 = + 80.7308. When we substitute this into our equation, we get:

  • 7/27/2019 United States Obesity Rates

    18/20

    Masangcay 18

    obes= 0 +1pcinc+2age+ 3black+4ffr+5gen+ 6leisinact+7fruveg+ 8fruveg*age obes= 0 +1pcinc+[+ 80.7308]age+ 3black+4ffr+5gen+ 6leisinact+7fruveg

    + 8fruveg*age obes= 0 +1pcinc+age+ 3black+4ffr+5gen+ 6leisinact+7fruveg+

    8fruveg[age + 0.7308]

    Here, the mean age is added, not subtracted, because of the original negative sign associated withit (subtracting a negative means to add the number as a positive term). The relevant test is H0: = 0.

    Stata Table I: Interaction fr uveg*ageTest

    The t-statistic to focus upon is the one associated withfruveg. The t-statistic is less than thecritical value, so we fail to reject the null hypothesis at the 5% level, so at the average age,fruveghas a statistically insignificant effect on the obesity rate. There is also now an associatednegative sign withfruveg, stating that as the number of people who eat the recommended amount

    of fruits and vegetables increases, the obesity rate will decrease. However, this interpretationshould be taken lightly because the variable is insignificant.

    ConclusionI originally had three different models for my projectthe 2007, 2011, and difference model.However, the difference model was the most compelling because it was able to eliminate someomitted variable bias. Since omitted variable bias is a challenge for any project, especially giventhe number of undocumented variables in my project, I decided it would be best to use thedifference model for all of my final results.

    The overall question regarded if fast food restaurants and the lack in healthier foods eaten havebeen the driving force behind increasing obesity rates. My results show that these factors do notcontribute to the obesity rate. The variableffrwas the most important variable because itmeasured essentially what my guiding question asked. This variable was insignificant at alllevels in every regression. Thefruvegvariable was important too, but not as much asffr. Thisvariable was only significant when the quadratic terms (ffr_sq andfruveg_sq) were added intothe equation. Additionally, the f-test showed that it is justifiable to drop theffrandfruvegtermsfrom the regression, further emphasizing the insignificance these variables had in regard to obes.

  • 7/27/2019 United States Obesity Rates

    19/20

    Masangcay 19

    Omitted variable bias was a very big challenge throughout my project. Table E lists a fewexamples of factors that perhaps affect the obesity rate that I was unable to capture. The reasonsfor not being able to include these factors are mostly due to a lack of information in these areasor the challenge in being able to quantify the variables. For example, changes in technology maygreatly affect the obesity rate. With changes in technology, the method of food processing within

    fast food restaurants and the process of growing of fruits and vegetables can change and it ispossible that this may affect how healthy or unhealthy the final product is. However, technologywould extremely hard to measure and insert into a dataset. Other factors, such as amount of timespent working out, are much easier to quantify, but variables such as these had very minimal tono information published for the United States. With that said, because there are still quite anumber of variables missing from my project that could greatly influence my results, I do notthink that my model was able to accurately answer my guiding question. This leads to thediscussion of possible ways to improve upon my project in the future.

    There are a number of ways to extend and better my project for the future. First, including moreobservable data in my set would be a major way in which my project can be improved.

    Specifically, including information regarding the presence of organic and locally grown foodscould be interesting, as this would contrast with the results concerning fast food restaurants.Second, choosing different years to research could be very useful. The years that I choose, 2011and 2007, were arbitrary years because those were the only time periods in which the data I wassearching for was available in. If I were able to selectively choose years, I would like to be ableto research the mid-1990s, as this is when the obesity rate started to rapidly increase in America.By researching the mid-1990s specifically, perhaps the reason(s) for the immediate increases inobesity rates would be more apparent and obvious. Third, I would find it very intriguing to dothe same study, but in another country. Although my research project concerns obesity rates inAmerica, it could be interesting to analyze if the same results are seen in other areas of the world.Because America has had the highest number for obesity rates for quite a number of years,looking into the causes of other countries obesity factors may give insight as to what setsAmerica apart from other areas in terms of health wellbeing. It would also be interesting to try tocollect the same data used for my project on an individual basis rather than an aggregate statebasis.

    The ways to improve upon my project extend well beyond the reasons listed above, but if I wereto personally do this project again and have more time, these are the areas that I would like tolook at. The results from my project definitely surprised me and I did not expect the resultsreceived. I thoroughly enjoyed working on this project, but still cannot help but wonder how theresults would have differed if I were able to account for all of the variables that I would haveliked to include.

    Attached at the back of this paper are the data from 2007, 2011, and the difference model.

  • 7/27/2019 United States Obesity Rates

    20/20

    Masangcay 20

    Works Cited

    Anderson, Michael and David Matsa. Are Restaurants Really Supersizing America? 2011. InAmerican Economic Journal..

    Centers for Disease Control and Prevention (CDC). Prevalence of Obesity in the United States,2009- 2010. .

    Centers for Disease Control and Prevention (CDC). Overweight and Obesity..

    Currie, Janet, Stefano Della Vigna, Enrico Moretti, and Vikram Pathania. The Effect of FastFood Restaurants on Obesity and Weight Gain. 2009. InAmerican Economic Journal..

    Petrlli, Jennifer and Kathleetn Wolin. 2009. Obesity. Santa Barbara, California: GreenwoodPress.

    Schosser, Eric. 2001.Fast Food Nation. Boston, Massachusetts: Houghton Mifflin Harcourt.

    http://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdfhttp://www.cdc.gov/nchs/data/databriefs/db82.pdfhttp://www.cdc.gov/obesity/data/adult.htmlhttp://www.nber.org/papers/w14721.pdf?new_window=1http://www.nber.org/papers/w14721.pdf?new_window=1http://www.cdc.gov/obesity/data/adult.htmlhttp://www.cdc.gov/nchs/data/databriefs/db82.pdfhttp://are.berkeley.edu/~mlanderson/pdf/Anderson%20and%20Matsa%202011.pdf