analysis of dangerous by design

13
HYPOTHESIS TESTING John-Mark Palacios URP6200 Planimetrics Dr. Prosperi

Upload: john-mark-palacios

Post on 14-Jul-2015

82 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

Page 1: Analysis of Dangerous by Design

HYPOTHESIS TESTING

John-Mark Palacios

URP6200 – Planimetrics

Dr. Prosperi

Page 2: Analysis of Dangerous by Design

TABLE OF CONTENTS

Page

I. INTRODUCTION 1

II. ANALYSIS 1

III. T- TEST 2

IV. F – TEST 3

V. SCATTERPLOT / CORRELATION / REGRESSION 6

VI. CONCLUSIONS 11

Page 3: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 1

I. INTRODUCTION

The Dangerous by Design report, which highlighted cities across the United

States that were the most dangerous for pedestrians, rocked the state of Florida by

ranking its four largest metropolitan areas as the top 4 cities for pedestrian

fatalities. Analytical readers of this report might wonder whether the state was

doing such a horrible job at protecting pedestrians, or if there are other factors

such as demographics at play. Besides the number of pedestrians killed and the

population, this report only looked at the American Community Survey's Journey

to Work data as a proxy for pedestrians presence on the roadways. While few

people walk to work, many more people walk for leisure or to run errands. We

pulled in data from other sourcesto see how walkability, density, and non-work

walking trips compare and correlate with the pedestrian fatality data.

Walkscore.com is known for its ability to calculate a score for individual

addresses, but it has performed limited analyses of larger areas such as

neighborhoods and cities. The Dangerous by Design report usedmetropolitan

areas as the cases. Since Walkscore did not respond to our request for a

Walkscore for metropolitan areas, we used their listing of walkscore by cities. The

largest city in the metropolitan area was used to represent the walkscore for the

metropolitan area. While this may not be entirely representative of the overall

area, most American cities follow similar patterns of development and any

discrepancy might be expected to be consistent between cities. Included on

Walkscore's list of cities is a top ten list of most walkable cities, so we used this to

create a dichotomous variable of whether a city was on this list.

In order to provide a variable that encompassed the larger metropolitan area, we

used population density obtained from the 2010 census for urbanized areas. While

the pedestrian fatality report may have been performed at the metropolitan

statistical area level, this uses county lines as the boundaries and generally

includes vast swaths of rural lands. Density is more accurately measured within

the contiguous urbanized area, excluding the low-density census blocks that go

from suburban to rural.

The Centers for Disease Control collected data on physical activity for various

metropolitan areas across the country. While this data is not as extensive as the

American Community Survey's, it does provide a more absolute measure of

people walking in these areas. This data was available for fewer metropolitan

areas, so a subset of the cases was used when comparing with this physical

activity data.

II. ANALYSIS

The original Dangerous by Design report included the following variables:

1. Ranking, based on Pedestrian Danger Index

2. Metro Area

3. Total Pedestrian Deaths between 2000 and 2009

Page 4: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 2

4. Average annual pedestrian deaths per 100,000 residents (2000-2009)

5. Percent of workers walking to work between 2005 and 2009

6. Pedestrian Danger Index (PDI), calculated with the above variables,

#4 x 100,000 / #5

We have added the following:

7. Walkscore for central city, a value between 0 and 100

8. Population Density of the urbanized area

9. Whether central city is on the "Most Walkable" list

(yes/no represented as 1/0)

10. Name of Central City

11. State of Central City

The above data had 52 cases. The physical activity data with a subset of cases, for

only 36 urban areas out of the 52, included the following variables:

12. Percent of respondentswho reported walking as one of the two most

frequent leisure time activities they participatedin within the past month.

13. Percent of respondents walking at least 5 times a week, 30 minutes per

session, within the past month.

The CDC physical activity was based on surveys with at least 500 respondents.

The following tests were conducted to analyze these variables:

T-test looking for significance between the central city being on the "Most

Walkable" list and the average annual pedestrian deaths

F-test looking at differences in annual pedestrian deaths by state of central

city

F-test looking at differences in any walking by state of central city

Scatterplot and correlation coefficient between annual pedestrian deaths

and walkscore

Correlation coefficients among PDI, walkscore, annual pedestrian deaths,

population density, percent walking any time, and percent walking 5 times

a week

Scatterplot for walkscore vs. annual pedestrian deaths

Scatterplot for percent of people walking any time vs. annual pedestrian

deaths

Regression analysis using those variables with the highest correlation

coefficients

III. T - TESTS - Tests of difference

"Most Walkable" List and the Average Annual Pedestrian Deaths Per 100,000

1. Ho: There is no difference between the average annual pedestrian deaths per

100,000 for those cities on the "Most Walkable" cities list and those not on the

list.

Page 5: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 3

H1: There is a difference between the average annual pedestrian deaths for

those cities on the "Most Walkable" cities list and those not on the list.

2. t, w/ 50df.

3. .05

4. t = -0.376, Pr(t) = .708

5. Since pr(t) is > .05, we fail to reject the null hypothesis.

* The average annual deathsper 100,000 for those cities on the "Most Walkable"

list is1.59, compared to 1.67 for those off the list. The difference is very small

and there is no evidence to show that it is significant.

T test on list not on list

Mean 1.588888889 1.672093023

Variance 0.361111111 0.364916944

Observations 9 43

Pooled Variance 0.36430801 Hypothesized Mean Difference 0 Observed Mean Difference -0.083204134 Df 50 t Stat -0.376066248 P (T<=t) one-tail 0.354229321 t Critical one-tail 1.675905025 P (T<=t) two-tail 0.708458641 t Critical two-tail 2.008559112 Table 1. T-test.

IV. F - TEST - Test of difference

Test 1: Annual Pedestrian Deaths among cities in different states

1. Ho: There is no difference in the average number of annual pedestrian deaths

among the cities of different states.

H1: At least one state's average number of annual pedestrian deaths is different

among all the states.

2. F, with (30) + (21) df.

3. .05

4. F = 6.141; pr (F) = .0000306

Page 6: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 4

5. Since pr(F) is < .05,we reject the null hypothesis and accept the alternative

hypothesis.

* At least one of the means is different. Inspection of the averages shows that

Florida has a very high number of pedestrian deaths, averaging over 3 deaths

per year per 100,000 people.

Table 2. Anova: Single Factor (Test 1)

SUMMARY

Groups Count Sum Average Variance

AL 1 1.2 1.2 #DIV/0!

AZ 2 4.6 2.3 0

CA 6 11.7 1.95 0.115

CO 1 1.6 1.6 #DIV/0!

CT 1 1.2 1.2 #DIV/0!

DC 1 1.7 1.7 #DIV/0!

FL 4 12.2 3.05 0.096666667

GA 1 1.6 1.6 #DIV/0!

IL 1 1.4 1.4 #DIV/0!

IN 1 1.1 1.1 #DIV/0!

KY 1 1.6 1.6 #DIV/0!

LA 1 2.4 2.4 #DIV/0!

MA 1 1.1 1.1 #DIV/0!

MD 1 1.8 1.8 #DIV/0!

MI 1 1.8 1.8 #DIV/0!

MN 1 0.8 0.8 #DIV/0!

MO 2 2.6 1.3 0.02

NC 2 3.1 1.55 0.045

NV 1 2.5 2.5 #DIV/0!

NY 3 4.5 1.5 0.13

OH 3 2.5 0.833333333 0.023333333

OK 1 1.4 1.4 #DIV/0!

OR 1 1.2 1.2 #DIV/0!

PA 2 2.8 1.4 0.18

RI 1 1.2 1.2 #DIV/0!

TN 2 3.5 1.75 0.245

TX 4 7.1 1.775 0.0425

UT 1 1.3 1.3 #DIV/0!

VA 2 2.4 1.2 0.08

WA 1 1.2 1.2 #DIV/0!

WI 1 1.1 1.1 #DIV/0!

Page 7: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 5

ANOVA Source of

Variation SS df MS F P-value F critical Between Groups

16.39775641 30 0.54659188 6.140934188 0.00003058 2.0102483

Within Groups

1.869166667 21 0.089007937

Total

18.26692308 51

Test 2: Differences in Walking for Physical Activity by State

1. Ho: There is no difference in the average percentage of people walking among

the cities of different states.

H1: At least one state's average percentage of people walking is different

among all the states.

2. F, with (28) + (7) df.

3. .05

4. F = 0.927; pr (F) = ..597

5. Since pr(F) is > .05,we fail to reject the null hypothesis.

* There is no evidence to show that there is a difference in the average

percentage of people walking among different states.

Page 8: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 6

Table 3. Anova: Single Factor (Test 2)

SUMMARY

Groups Count Sum Average Variance

AZ 2 70.7 35.35 10.125

CA 1 38.5 38.5 #DIV/0!

CO 1 42.1 42.1 #DIV/0!

CT 1 40.1 40.1 #DIV/0!

DC 1 40.2 40.2 #DIV/0!

FL 3 101 33.66666667 40.82333333

GA 1 41 41 #DIV/0!

IL 1 36.3 36.3 #DIV/0!

IN 1 41.9 41.9 #DIV/0!

KY 1 36.3 36.3 #DIV/0!

LA 1 32.9 32.9 #DIV/0!

MA 1 41.8 41.8 #DIV/0!

MD 1 39.3 39.3 #DIV/0!

MI 1 40.6 40.6 #DIV/0!

MN 1 37.7 37.7 #DIV/0!

MO 2 75.9 37.95 0.005

NC 1 41 41 #DIV/0!

NV 1 37.5 37.5 #DIV/0!

NY 1 37.8 37.8 #DIV/0!

OH 1 41.4 41.4 #DIV/0!

OK 1 38.5 38.5 #DIV/0!

OR 1 45.1 45.1 #DIV/0!

PA 2 87.9 43.95 0.125

RI 1 40.1 40.1 #DIV/0!

TN 2 79.9 39.95 25.205

TX 2 78.1 39.05 4.805

UT 1 45.1 45.1 #DIV/0!

WA 1 48.5 48.5 #DIV/0!

WI 1 44.2 44.2 #DIV/0!

ANOVA

Source of Variation SS df MS F P-value F critical

Between Groups 452.0983333 28 16.14636905 0.927102274 0.597200543 3.385786974

Within Groups 121.9116667 7 17.41595238

Total 574.01 35

Page 9: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 7

V. SCATTERPLOT/CORRELATION/ REGRESSION - Test of relationship

Scatterplot – Graph

Figure 1. Plot of Walkscore (X axis) vs. Annual Deaths per 100,000 (Y axis)

Since Figure 1 tilts slightly down to the right, it appears that deaths decrease as

walkability increases.

Figure 2. Plot of %Walking (X axis) vs. Annual Deaths per 100,000 (Y axis)

0

0.5

1

1.5

2

2.5

3

3.5

4

0 20 40 60 80 100

Walkscore for central city

Walkscore for central city

0

10

20

30

40

50

60

0 1 2 3 4

% any walking in the past month

% any walking in the past month

Page 10: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 8

Figure 2 tilts downward to the right, implying that there is a tendency for deaths

to increase as walking decreases.

CORRELATION:

Correlations

Avg. annual pedestrian deaths per 100,000 (2000--2009)

Percent of workers walking to work (2005--2009) PDI

Walkscore for Central City

Population Density

Avg. annual pedestrian deaths per 100,000 (2000--2009) 1

Percent of workers walking to work (2005--2009) -0.224210565 1

PDI 0.820076533 -0.653196727 1

Walkscore for central city -0.167115655 0.774097237

-0.519648515 1

Population Density 0.280170762 0.383542089

-0.08963102

7 0.443313978 1

N=52

50 df R value required for a two-tailed

test with 0.05 significance 0.273

Table 4. Correlation among variables.

Correlations

Total pedestrian deaths (2000--2009)

Avg. annual pedestrian deaths per 100,000 (2000--2009)

Percent of workers walking to work (2005--2009) PDI

Walkscore for central city

Population Density

% any walking in the past month

% walk at least 5 times per week, 30 min.

Total pedestrian deaths (2000--2009) 1

Avg. annual pedestrian deaths per 100,000 (2000--2009) 0.33079027 1

Percent of workers 0.433803879

-0.21869350

2 1

Page 11: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 9

Correlations

Total pedestrian deaths (2000--2009)

Avg. annual pedestrian deaths per 100,000 (2000--2009)

Percent of workers walking to work (2005--2009) PDI

Walkscore for central city

Population Density

% any walking in the past month

% walk at least 5 times per week, 30 min.

walking to work (2005--2009)

PDI 0.041281599 0.82361424

4

-0.670029

555 1 Walksc

ore for central city 0.427692706

-0.19742731

8 0.768320

164

-0.58

5401773 1

Population Density 0.739550019

0.321139316

0.321555001

-0.02

5273485

0.42224739 1

% any walking in the past month -0.300887411

-0.61718871

5 0.276923

962

-0.56

131522

0.120225571

-0.2441474

74 1 % walk

at least 5 times per week, 30 min. -0.001619811

-0.15241793

1 0.475813

507

-0.37

4771483

0.439175416

0.034473505

0.458083911 1

N=36

34 df

R value required for a two-tailed test with 0.05 significance 0.33

Table 5. Correlation among variables in the subset of the data.

REGRESSION:

The Walkscore variable was removed from the final regression model because the p-

value was 0.47, greater than 0.05. Table 6 shows one regression model that endeavors to

account for pedestrian deaths. It takes the form Annual Deaths = 1.566 -23.394 x (% walk

to work) + 0.000217 x (Population Density). Note that the r-value is rather low for this

model, however, implying that it does not account well for the variability.

SUMMARY OUTPUT

Response Variable

Avg. annual pedestrian deaths per 100,000 (2000--2009)

Regression Statistics

Multiple R 0.455491226

R^2 0.207472257

Standard Error 0.543553

Adjusted R^2 0.175124186

Page 12: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 10

Observations 52

ANOVA

df SS MS F Significance of F

Regression 2 3.789879765

1.894939882

6.413744314 0.003356252

Residual 49 14.477043

31 0.295449

864

Total 51 18.266923

08

Coefficients

Standard Error t-Statistics

p-Value Lower 95%

Intercept 1.566014419

0.23349281

6.706906403

1.89E-08 1.096793051 Upper 95%

Percent of workers walking to work (2005--2009)

-23.39400507

8.284347583

-2.823880

196 0.006841321 -40.04202483

2.035235787

Population Density 0.000217314 6.97E-05 3.117594

818 0.003049027 7.72E-05

-6.74598531

6

Table 6.Regression model using the full dataset.

Using the subset of the data with fewer metropolitan areas that also included a percent

walking for leisure variable, we were able to create a model that accounted for about 42%

of the variability in annual deaths per 100,000. This time we kept the Walkscore, which

reduces the R value if it is removed. See Table 7. The form of the relationship is Annual

deaths = 4.85 – 0.0769 x (% walking) + 0.000170 x (Population Density) + 0.011 x

(Walkscore)

SUMMARY OUTPUT

Response Variable Avg. annual pedestrian deaths per 100,000 (2000--2009)

Regression Statistics

Multiple R 0.683032844

R^2 0.466533865

Standard Error 0.467002493

Adjusted R^2 0.416521415

Observations 36

ANOVA

df SS MS F Significance of F

Regression 3 6.1032997

02 2.0344332

34 9.328354

528 0.000140149

Residual 32 6.9789225

2 0.2180913

29

Total 35 13.082222

22

Page 13: Analysis of Dangerous by Design

HYPOTHESIS TESTING

Page 11

Coefficients

Standard Error t-Statistics p-Value Lower 95% Upper 95%

Intercept 4.854104449 0.8732036

64 5.5589602

36 3.91E-06 3.075446789 6.632762108

% any walking in the past month -0.076929247

0.020782218

-3.7016860

94 0.000803

202 -0.11926124 -0.034597254 Population Density 0.000169983 8.28E-05

2.052129648

0.048413212 1.26E-06 0.000338707

Walkscore for central city -0.011052871

0.006100955

-1.8116624

84 0.079433

331 -0.023480109 0.001374367

Table 7. Regression model using the smaller data set.

VI. CONCLUSIONS

The following discusses the conclusions of each test:

1. T – Test

Results of the T-test show that there may not be any difference in the

annual deaths for the more walkable and the less walkable cities..

2. F – Test

Results of the F-test show that Florida has a disproportionately high

number of annual pedestrian deaths. The second F-test also shows that the

percentage of people walking does not appear to significantly change from

state to state.

3. Scatterplot / Correlation / Regression

Scatterplot, Correlation, and Regression tests show that Walkscore,

Population Density, and the proportion of walking (commuters or

residents), all relate to annual pedestrian deaths. The second regression

model seems to have the better fit, exchanging % walking to work for %

walking in general, and utilizing population density and Walkscore. It is

surprising that Population Density has a positive coefficient, however. Our

expectation, especially since density contributes to a higher Walkscore,

was that with higher densities would come fewer pedestrian fatalities. The

reality is that this impact is very slightly positive.