ecn 410 final project paper, james wiltbank, nathan waters

10
James Wiltbank, Nathan Waters Applied Regression Analysis TTH 7:30 WILL PRAY FOR WORK: HOW THE ECONOMY AFFECTS RELIGIOUS ACTIVITY Introduction Our research will attempt to answer the following question: How does the economy affect the amount of time that people spend doing religious activities? Motivation As we were thinking about this project and looking at different data sources we came across the American Time Use Survey based in Minneapolis. This project surveys thousands of people per year to find out how the average person spends their time doing an extensive list of activities (anything from brushing your teeth to participating in rodeos). As we were perusing the data we thought it would be interesting to find the correlation between the state of the economy and time spent doing religious activities. Theory In order to assess the state of the economy for the time period of interest we decided to use data for two different metrics of economic performance. The first variable we chose was GDP growth. Strong GDP growth corresponds to more opportunities for individuals and a general sense of confidence in the economic future of a region. We chose this metric because even though most people don’t pore over GDP growth data on a regular basis, the statistic is generally indicative of the attitudes that the public has regarding the economy. Our expectation is that poor GDP growth (and thus negative attitudes about the state of the economy.) will increase the amount of time people spend doing religious activities. The other measure that we chose is unemployment rate. The unemployment rate has an even more profound impact on individuals as it directly affects the amount of time and resources available within a household. We believe

Upload: nathan-waters

Post on 14-Apr-2017

119 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

James Wiltbank, Nathan WatersApplied Regression Analysis TTH 7:30

WILL PRAY FOR WORK:HOW THE ECONOMY AFFECTS RELIGIOUS ACTIVITY

Introduction

Our research will attempt to answer the following question: How does the economy affect the amount of time that people spend doing religious activities?

Motivation

As we were thinking about this project and looking at different data sources we came across the American Time Use Survey based in Minneapolis. This project surveys thousands of people per year to find out how the average person spends their time doing an extensive list of activities (anything from brushing your teeth to participating in rodeos). As we were perusing the data we thought it would be interesting to find the correlation between the state of the economy and time spent doing religious activities.

Theory

In order to assess the state of the economy for the time period of interest we decided to use data for two different metrics of economic performance. The first variable we chose was GDP growth. Strong GDP growth corresponds to more opportunities for individuals and a general sense of confidence in the economic future of a region. We chose this metric because even though most people don’t pore over GDP growth data on a regular basis, the statistic is generally indicative of the attitudes that the public has regarding the economy. Our expectation is that poor GDP growth (and thus negative attitudes about the state of the economy.) will increase the amount of time people spend doing religious activities. The other measure that we chose is unemployment rate. The unemployment rate has an even more profound impact on individuals as it directly affects the amount of time and resources available within a household. We believe that the loss of household funds that result from a recently laid off person coupled with the increase in available free time will result in the unemployment rate having a greater effect on the dependent variable than GDP growth. In addition there will be a positive correlation between unemployment and time spent doing religious activities.

Data

As indicated above we obtained our time use data from The American Time Use Survey. Their data retrieval website is located at www.atusdata.org. The survey is conducted yearly, with available data consisting of the years from 2003 to 2013. Initially we thought that this would be a concern because it would only give us the national mean time spent for these eleven years (i.e. only 11 observations). This problem can be solved however by isolating the annual means for each state, of which data is available on the website. The raw data that we obtained is in terms of how many minutes per day an individual spends doing religious

Page 2: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

activities. For the 11 years between 2003 and 2013, the survey had a response rate of between 50 and 60%, resulting in between 11,000 and 13,000 individuals responding. The summary data for time spent doing religious activities is the following:

Variable

Obs Mean Std. Dev.

Min Max

relact 561 12.5832 6.396749 0 42.46939

Again, each observation is the mean for a given state during a given year. The value of zero as a minimum then may seem counterintuitive. How is it possible that the mean for an entire state is zero? Upon examining the data we found five values of zero, four for Vermont and one for Wyoming. According to U.S census data, these are the two least populous states in the U.S. It is conceivable that the amount of respondents from these states were very low or even zero during these years, resulting in overall means of zero.

The unemployment data that we used for this project is from the Bureau of Labor Statistics. Their website is www.bls.gov. In order to run a proper regression, we had to find specialized unemployment data that matched our time use data. This means that both of these data sets are divided by state at an annual frequency. In addition, the data we obtained from the BLS defines the unemployment rate as the total number of persons who have actively sought work in the past four weeks but do not have current employment, divided by the total number of participants in the labor force. Thus, this figure is a percentage. The summary data for unemployment is the following:

Variable

Obs Mean Std. Dev.

Min Max

unem 561 6.309269 2.131369 2.6 13.8

We obtained data on GDP growth from the Bureau of Economic Analysis from their website at www.bea.gov. We had to format this data in a similar manner as the unemployment data, that is, by state and year. As a result, each of our data sets pool together cross-sectional and time series data to form panel data. The raw data that we obtained was the formatted total GDP in dollars. We decided to convert these figures into the percentage change from the previous period. For example, the value for gdpgrowth for Kentucky in 2007 is defined as (GDPKY,2007 – GDPKY,2006)/GDPKY,2006. The initial GDP figures we used were in terms of chained 2009 dollars so inflation is not an issue. The summary data for GDP growth is the following.

Variable

Obs Mean Std. Dev.

Min Max

gdpgrowth

561 1.842246 2.740102 -9.3 20.3

Model

Page 3: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

The model that we will estimate to try and answer our research question will have the following form:

(relactt) = β0 – β1(gdpgrowtht) + β2(unemt) + ei

relactt is the average amount of time (in minutes) that an individual spends per day doing religious activities at time t. gdpgrowtht is the annual growth rate of GDP at time t, and unemt is the unemployment rate at time t. ei is a stochastic error term. Our inclination is that a poor economy will increase the amount of time people spend doing religious activities and vice versa so we hypothesize that β1 will have a negative sign and β2 will have a positive sign.

Notes on functional form

Initially we considered using the log of the independent variables as a way to more accurately model the data. However, after running a regression with these log variables we found them both to be individually insignificant. We then performed an f-test on regressions with only the log variables, only the non-log variables and then with both and found them all to be jointly significant. The values for R2 changed very little in these regressions so as a result of all this we decided to use the non-log variables to avoid the confusion of interpreting the coefficients as a percentage change of a percentage. We also considered including lags for each independent variable since there may be a delayed effect on the dependent variable. However, after running a regression with the lags included we found the lags to be statistically insignificant with very small coefficient estimates. We then ran an f-test and found that the whole model was jointly insignificant, so we decided to exclude the lagged variable from our model. Lastly, we considered including a time trend since GDP growth might have a trend, but we found the trend to be insignificant and thus we omitted it.

Results

After running a regression of the model described above we obtained the following results:

Interpretation

Based on the results from the above table, we find that the unemployment rate is strongly positively correlated with the amount of time that individuals spend doing religious activities. For every 1% increase in the unemployment rate, we would expect there to be an increase of 0.46 minutes per day in the mean amount of time spent doing religious activities, holding everything else constant. This of course might simply be due to the fact that unemployed individuals will have more available free time to devote to religious pursuits. Since the p-value for unem is 0.001, our results are statistically significant for almost any desired

Page 4: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

confidence level. For every 1% increase in GDP growth from the previous year, we might be able to expect an increase in .03 minutes per day spent on religious activities (again holding everything else constant). However, with a p-value of 0.764 these results are not statistically significant so we cannot form any conclusions about the effect of GDP growth on our dependent variable. In the very unlikely case of no growth in GDP and full employment we would expect the mean amount of time spent on religious activities to be 9.59 minutes per day. The value of R2 is not pictured in the above table but it was very low (about 0.03).

Hypothesis Tests

Our model predicts that the unemployment rate will have a positive correlation on the amount of time spent doing religious activities, so our hypothesis test is as follows:

H0 : β2 < 0 , HA : β2 > 0

Since the p-value for unem is 0.000, we can reject the null hypothesis and conclude that the sign on β2 will be positive. We also hypothesized that positive GDP growth would have a negative effect on relact, so our hypothesis test is this:

H0 : β1 > 0 , HA : β2 < 0

Our results show that the coefficient on gdpgrowth would be positive which would appear to contradict our hypothesis, but since the estimate is not statistically significant we cannot draw any conclusions about the effect of GDP growth.

Outliers

The removal of outliers from our regression had a significant effect on our results. Using the guideline of 3(k +1)/n to classify outlying data points, we found that our data contained 49 different outliers. A large proportion of these came from two states, South Carolina and Utah, with 8 and 7 of the years surveyed respectively being considered outliers. Once we removed these data points from our regression we obtained the following results:

Page 5: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

Both of the coefficient estimates increase (substantially in the case of gdpgrowth) as does the t-scores for both variables. In the case of gdpgrowth, the t-score improves so much that we can now say our estimate is moderately statistically significant. This changes our hypothesis test as well. Upon removal of the outliers we now draw the conclusion that we cannot reject the null hypothesis that the sign of β1 will be greater than or equal to zero, contradicting our initial expectations. Our value for R2 changed very little again, remaining at about 0.03.

Multicollinearity

In order to test for multicollinearity we first looked at the simple correlation coefficient between unem and gdpgrowth. We obtained a value of -0.3867 which is below the benchmark of 0.8. This provides evidence that multicollinearity should not be an issue. Further evidence is provided by our obtained value for the High Variance Inflation Factor (VIF). Our figure of 1.18 is well below the benchmark of 5, implying a low level of multicollinearity. From these results we can draw the conclusion that multicollinearity should not be a factor in skewing the standard errors of our regression.

Omitted Variables

We conducted a Ramsey RESET test to detect omitted variable bias:

The p-value of 0.3188 means that we cannot reject the null hypothesis and that the data appear to be consistent with the null that our model has no omitted variables.

Heteroskedasticity

To detect non-constant variance in our regression we performed a White Test in Stata:

The p-value of 0.5968 suggests that we cannot reject the null and that our data appear to be consistent with the null hypothesis of homoskedasticity. In addition we plotted the mean of our dependent variable against the residuals:

Page 6: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

-10

010

2030

Res

idua

ls

0 10 20 30 40(mean) relact

The variance appears to be constant, implying that heteroskedasticity should not affect our regression results.

Stationarity

In order to detect non-stationarity in our regression we performed a Harris-Tzavalis test:

The Harris-Tzavalis test is most useful for panel data where there are a large number of cross-sections and relatively few time periods, which is true of our data. The p-value of 0.0000 means that we reject the null hypothesis that our data contains unit roots, providing very strong evidence that our data is stationary.

Serial Correlation

In order to detect serial correlation in our data we performed a Wooldridge test:

Page 7: ECN 410 Final Project Paper, James Wiltbank, Nathan  Waters

The p-value of 0.6586 means that we cannot reject the null hypothesis of no first-order autocorrelation (serial correlation), and that our data appears to be consistent with the null hypothesis of no serial correlation

Bibliography

Center, University of Maryland, College Park, Maryland, and Minnesota Population

Center, University of Minnesota, Minneapolis, Minnesota.

Sandra L. Hofferth, Sarah M. Flood, and Matthew Sobek. 2013. American Time Use

Survey Data Extract System: Version 2.4 [Machine-readable database].

Maryland Population Research

U.S. Bureau of Economic Analysis, “Table 1.1.5 Gross Domestic Product”

http://bea.gov/iTable/iTable.cfm?

ReqID=9&step=1#reqid=9&step=3&isuri=1&903=5 (accessed March 7, 2015).

U.S. Bureau of Labor Statistics, “Vintage Data: Seasonally Adjusted Total Nonfarm”

www.bls.gov/ces/cesvin00.xlsx (accessed March 7, 2015)