analysis of dangerous by design
TRANSCRIPT
HYPOTHESIS TESTING
John-Mark Palacios
URP6200 – Planimetrics
Dr. Prosperi
TABLE OF CONTENTS
Page
I. INTRODUCTION 1
II. ANALYSIS 1
III. T- TEST 2
IV. F – TEST 3
V. SCATTERPLOT / CORRELATION / REGRESSION 6
VI. CONCLUSIONS 11
HYPOTHESIS TESTING
Page 1
I. INTRODUCTION
The Dangerous by Design report, which highlighted cities across the United
States that were the most dangerous for pedestrians, rocked the state of Florida by
ranking its four largest metropolitan areas as the top 4 cities for pedestrian
fatalities. Analytical readers of this report might wonder whether the state was
doing such a horrible job at protecting pedestrians, or if there are other factors
such as demographics at play. Besides the number of pedestrians killed and the
population, this report only looked at the American Community Survey's Journey
to Work data as a proxy for pedestrians presence on the roadways. While few
people walk to work, many more people walk for leisure or to run errands. We
pulled in data from other sourcesto see how walkability, density, and non-work
walking trips compare and correlate with the pedestrian fatality data.
Walkscore.com is known for its ability to calculate a score for individual
addresses, but it has performed limited analyses of larger areas such as
neighborhoods and cities. The Dangerous by Design report usedmetropolitan
areas as the cases. Since Walkscore did not respond to our request for a
Walkscore for metropolitan areas, we used their listing of walkscore by cities. The
largest city in the metropolitan area was used to represent the walkscore for the
metropolitan area. While this may not be entirely representative of the overall
area, most American cities follow similar patterns of development and any
discrepancy might be expected to be consistent between cities. Included on
Walkscore's list of cities is a top ten list of most walkable cities, so we used this to
create a dichotomous variable of whether a city was on this list.
In order to provide a variable that encompassed the larger metropolitan area, we
used population density obtained from the 2010 census for urbanized areas. While
the pedestrian fatality report may have been performed at the metropolitan
statistical area level, this uses county lines as the boundaries and generally
includes vast swaths of rural lands. Density is more accurately measured within
the contiguous urbanized area, excluding the low-density census blocks that go
from suburban to rural.
The Centers for Disease Control collected data on physical activity for various
metropolitan areas across the country. While this data is not as extensive as the
American Community Survey's, it does provide a more absolute measure of
people walking in these areas. This data was available for fewer metropolitan
areas, so a subset of the cases was used when comparing with this physical
activity data.
II. ANALYSIS
The original Dangerous by Design report included the following variables:
1. Ranking, based on Pedestrian Danger Index
2. Metro Area
3. Total Pedestrian Deaths between 2000 and 2009
HYPOTHESIS TESTING
Page 2
4. Average annual pedestrian deaths per 100,000 residents (2000-2009)
5. Percent of workers walking to work between 2005 and 2009
6. Pedestrian Danger Index (PDI), calculated with the above variables,
#4 x 100,000 / #5
We have added the following:
7. Walkscore for central city, a value between 0 and 100
8. Population Density of the urbanized area
9. Whether central city is on the "Most Walkable" list
(yes/no represented as 1/0)
10. Name of Central City
11. State of Central City
The above data had 52 cases. The physical activity data with a subset of cases, for
only 36 urban areas out of the 52, included the following variables:
12. Percent of respondentswho reported walking as one of the two most
frequent leisure time activities they participatedin within the past month.
13. Percent of respondents walking at least 5 times a week, 30 minutes per
session, within the past month.
The CDC physical activity was based on surveys with at least 500 respondents.
The following tests were conducted to analyze these variables:
T-test looking for significance between the central city being on the "Most
Walkable" list and the average annual pedestrian deaths
F-test looking at differences in annual pedestrian deaths by state of central
city
F-test looking at differences in any walking by state of central city
Scatterplot and correlation coefficient between annual pedestrian deaths
and walkscore
Correlation coefficients among PDI, walkscore, annual pedestrian deaths,
population density, percent walking any time, and percent walking 5 times
a week
Scatterplot for walkscore vs. annual pedestrian deaths
Scatterplot for percent of people walking any time vs. annual pedestrian
deaths
Regression analysis using those variables with the highest correlation
coefficients
III. T - TESTS - Tests of difference
"Most Walkable" List and the Average Annual Pedestrian Deaths Per 100,000
1. Ho: There is no difference between the average annual pedestrian deaths per
100,000 for those cities on the "Most Walkable" cities list and those not on the
list.
HYPOTHESIS TESTING
Page 3
H1: There is a difference between the average annual pedestrian deaths for
those cities on the "Most Walkable" cities list and those not on the list.
2. t, w/ 50df.
3. .05
4. t = -0.376, Pr(t) = .708
5. Since pr(t) is > .05, we fail to reject the null hypothesis.
* The average annual deathsper 100,000 for those cities on the "Most Walkable"
list is1.59, compared to 1.67 for those off the list. The difference is very small
and there is no evidence to show that it is significant.
T test on list not on list
Mean 1.588888889 1.672093023
Variance 0.361111111 0.364916944
Observations 9 43
Pooled Variance 0.36430801 Hypothesized Mean Difference 0 Observed Mean Difference -0.083204134 Df 50 t Stat -0.376066248 P (T<=t) one-tail 0.354229321 t Critical one-tail 1.675905025 P (T<=t) two-tail 0.708458641 t Critical two-tail 2.008559112 Table 1. T-test.
IV. F - TEST - Test of difference
Test 1: Annual Pedestrian Deaths among cities in different states
1. Ho: There is no difference in the average number of annual pedestrian deaths
among the cities of different states.
H1: At least one state's average number of annual pedestrian deaths is different
among all the states.
2. F, with (30) + (21) df.
3. .05
4. F = 6.141; pr (F) = .0000306
HYPOTHESIS TESTING
Page 4
5. Since pr(F) is < .05,we reject the null hypothesis and accept the alternative
hypothesis.
* At least one of the means is different. Inspection of the averages shows that
Florida has a very high number of pedestrian deaths, averaging over 3 deaths
per year per 100,000 people.
Table 2. Anova: Single Factor (Test 1)
SUMMARY
Groups Count Sum Average Variance
AL 1 1.2 1.2 #DIV/0!
AZ 2 4.6 2.3 0
CA 6 11.7 1.95 0.115
CO 1 1.6 1.6 #DIV/0!
CT 1 1.2 1.2 #DIV/0!
DC 1 1.7 1.7 #DIV/0!
FL 4 12.2 3.05 0.096666667
GA 1 1.6 1.6 #DIV/0!
IL 1 1.4 1.4 #DIV/0!
IN 1 1.1 1.1 #DIV/0!
KY 1 1.6 1.6 #DIV/0!
LA 1 2.4 2.4 #DIV/0!
MA 1 1.1 1.1 #DIV/0!
MD 1 1.8 1.8 #DIV/0!
MI 1 1.8 1.8 #DIV/0!
MN 1 0.8 0.8 #DIV/0!
MO 2 2.6 1.3 0.02
NC 2 3.1 1.55 0.045
NV 1 2.5 2.5 #DIV/0!
NY 3 4.5 1.5 0.13
OH 3 2.5 0.833333333 0.023333333
OK 1 1.4 1.4 #DIV/0!
OR 1 1.2 1.2 #DIV/0!
PA 2 2.8 1.4 0.18
RI 1 1.2 1.2 #DIV/0!
TN 2 3.5 1.75 0.245
TX 4 7.1 1.775 0.0425
UT 1 1.3 1.3 #DIV/0!
VA 2 2.4 1.2 0.08
WA 1 1.2 1.2 #DIV/0!
WI 1 1.1 1.1 #DIV/0!
HYPOTHESIS TESTING
Page 5
ANOVA Source of
Variation SS df MS F P-value F critical Between Groups
16.39775641 30 0.54659188 6.140934188 0.00003058 2.0102483
Within Groups
1.869166667 21 0.089007937
Total
18.26692308 51
Test 2: Differences in Walking for Physical Activity by State
1. Ho: There is no difference in the average percentage of people walking among
the cities of different states.
H1: At least one state's average percentage of people walking is different
among all the states.
2. F, with (28) + (7) df.
3. .05
4. F = 0.927; pr (F) = ..597
5. Since pr(F) is > .05,we fail to reject the null hypothesis.
* There is no evidence to show that there is a difference in the average
percentage of people walking among different states.
HYPOTHESIS TESTING
Page 6
Table 3. Anova: Single Factor (Test 2)
SUMMARY
Groups Count Sum Average Variance
AZ 2 70.7 35.35 10.125
CA 1 38.5 38.5 #DIV/0!
CO 1 42.1 42.1 #DIV/0!
CT 1 40.1 40.1 #DIV/0!
DC 1 40.2 40.2 #DIV/0!
FL 3 101 33.66666667 40.82333333
GA 1 41 41 #DIV/0!
IL 1 36.3 36.3 #DIV/0!
IN 1 41.9 41.9 #DIV/0!
KY 1 36.3 36.3 #DIV/0!
LA 1 32.9 32.9 #DIV/0!
MA 1 41.8 41.8 #DIV/0!
MD 1 39.3 39.3 #DIV/0!
MI 1 40.6 40.6 #DIV/0!
MN 1 37.7 37.7 #DIV/0!
MO 2 75.9 37.95 0.005
NC 1 41 41 #DIV/0!
NV 1 37.5 37.5 #DIV/0!
NY 1 37.8 37.8 #DIV/0!
OH 1 41.4 41.4 #DIV/0!
OK 1 38.5 38.5 #DIV/0!
OR 1 45.1 45.1 #DIV/0!
PA 2 87.9 43.95 0.125
RI 1 40.1 40.1 #DIV/0!
TN 2 79.9 39.95 25.205
TX 2 78.1 39.05 4.805
UT 1 45.1 45.1 #DIV/0!
WA 1 48.5 48.5 #DIV/0!
WI 1 44.2 44.2 #DIV/0!
ANOVA
Source of Variation SS df MS F P-value F critical
Between Groups 452.0983333 28 16.14636905 0.927102274 0.597200543 3.385786974
Within Groups 121.9116667 7 17.41595238
Total 574.01 35
HYPOTHESIS TESTING
Page 7
V. SCATTERPLOT/CORRELATION/ REGRESSION - Test of relationship
Scatterplot – Graph
Figure 1. Plot of Walkscore (X axis) vs. Annual Deaths per 100,000 (Y axis)
Since Figure 1 tilts slightly down to the right, it appears that deaths decrease as
walkability increases.
Figure 2. Plot of %Walking (X axis) vs. Annual Deaths per 100,000 (Y axis)
0
0.5
1
1.5
2
2.5
3
3.5
4
0 20 40 60 80 100
Walkscore for central city
Walkscore for central city
0
10
20
30
40
50
60
0 1 2 3 4
% any walking in the past month
% any walking in the past month
HYPOTHESIS TESTING
Page 8
Figure 2 tilts downward to the right, implying that there is a tendency for deaths
to increase as walking decreases.
CORRELATION:
Correlations
Avg. annual pedestrian deaths per 100,000 (2000--2009)
Percent of workers walking to work (2005--2009) PDI
Walkscore for Central City
Population Density
Avg. annual pedestrian deaths per 100,000 (2000--2009) 1
Percent of workers walking to work (2005--2009) -0.224210565 1
PDI 0.820076533 -0.653196727 1
Walkscore for central city -0.167115655 0.774097237
-0.519648515 1
Population Density 0.280170762 0.383542089
-0.08963102
7 0.443313978 1
N=52
50 df R value required for a two-tailed
test with 0.05 significance 0.273
Table 4. Correlation among variables.
Correlations
Total pedestrian deaths (2000--2009)
Avg. annual pedestrian deaths per 100,000 (2000--2009)
Percent of workers walking to work (2005--2009) PDI
Walkscore for central city
Population Density
% any walking in the past month
% walk at least 5 times per week, 30 min.
Total pedestrian deaths (2000--2009) 1
Avg. annual pedestrian deaths per 100,000 (2000--2009) 0.33079027 1
Percent of workers 0.433803879
-0.21869350
2 1
HYPOTHESIS TESTING
Page 9
Correlations
Total pedestrian deaths (2000--2009)
Avg. annual pedestrian deaths per 100,000 (2000--2009)
Percent of workers walking to work (2005--2009) PDI
Walkscore for central city
Population Density
% any walking in the past month
% walk at least 5 times per week, 30 min.
walking to work (2005--2009)
PDI 0.041281599 0.82361424
4
-0.670029
555 1 Walksc
ore for central city 0.427692706
-0.19742731
8 0.768320
164
-0.58
5401773 1
Population Density 0.739550019
0.321139316
0.321555001
-0.02
5273485
0.42224739 1
% any walking in the past month -0.300887411
-0.61718871
5 0.276923
962
-0.56
131522
0.120225571
-0.2441474
74 1 % walk
at least 5 times per week, 30 min. -0.001619811
-0.15241793
1 0.475813
507
-0.37
4771483
0.439175416
0.034473505
0.458083911 1
N=36
34 df
R value required for a two-tailed test with 0.05 significance 0.33
Table 5. Correlation among variables in the subset of the data.
REGRESSION:
The Walkscore variable was removed from the final regression model because the p-
value was 0.47, greater than 0.05. Table 6 shows one regression model that endeavors to
account for pedestrian deaths. It takes the form Annual Deaths = 1.566 -23.394 x (% walk
to work) + 0.000217 x (Population Density). Note that the r-value is rather low for this
model, however, implying that it does not account well for the variability.
SUMMARY OUTPUT
Response Variable
Avg. annual pedestrian deaths per 100,000 (2000--2009)
Regression Statistics
Multiple R 0.455491226
R^2 0.207472257
Standard Error 0.543553
Adjusted R^2 0.175124186
HYPOTHESIS TESTING
Page 10
Observations 52
ANOVA
df SS MS F Significance of F
Regression 2 3.789879765
1.894939882
6.413744314 0.003356252
Residual 49 14.477043
31 0.295449
864
Total 51 18.266923
08
Coefficients
Standard Error t-Statistics
p-Value Lower 95%
Intercept 1.566014419
0.23349281
6.706906403
1.89E-08 1.096793051 Upper 95%
Percent of workers walking to work (2005--2009)
-23.39400507
8.284347583
-2.823880
196 0.006841321 -40.04202483
2.035235787
Population Density 0.000217314 6.97E-05 3.117594
818 0.003049027 7.72E-05
-6.74598531
6
Table 6.Regression model using the full dataset.
Using the subset of the data with fewer metropolitan areas that also included a percent
walking for leisure variable, we were able to create a model that accounted for about 42%
of the variability in annual deaths per 100,000. This time we kept the Walkscore, which
reduces the R value if it is removed. See Table 7. The form of the relationship is Annual
deaths = 4.85 – 0.0769 x (% walking) + 0.000170 x (Population Density) + 0.011 x
(Walkscore)
SUMMARY OUTPUT
Response Variable Avg. annual pedestrian deaths per 100,000 (2000--2009)
Regression Statistics
Multiple R 0.683032844
R^2 0.466533865
Standard Error 0.467002493
Adjusted R^2 0.416521415
Observations 36
ANOVA
df SS MS F Significance of F
Regression 3 6.1032997
02 2.0344332
34 9.328354
528 0.000140149
Residual 32 6.9789225
2 0.2180913
29
Total 35 13.082222
22
HYPOTHESIS TESTING
Page 11
Coefficients
Standard Error t-Statistics p-Value Lower 95% Upper 95%
Intercept 4.854104449 0.8732036
64 5.5589602
36 3.91E-06 3.075446789 6.632762108
% any walking in the past month -0.076929247
0.020782218
-3.7016860
94 0.000803
202 -0.11926124 -0.034597254 Population Density 0.000169983 8.28E-05
2.052129648
0.048413212 1.26E-06 0.000338707
Walkscore for central city -0.011052871
0.006100955
-1.8116624
84 0.079433
331 -0.023480109 0.001374367
Table 7. Regression model using the smaller data set.
VI. CONCLUSIONS
The following discusses the conclusions of each test:
1. T – Test
Results of the T-test show that there may not be any difference in the
annual deaths for the more walkable and the less walkable cities..
2. F – Test
Results of the F-test show that Florida has a disproportionately high
number of annual pedestrian deaths. The second F-test also shows that the
percentage of people walking does not appear to significantly change from
state to state.
3. Scatterplot / Correlation / Regression
Scatterplot, Correlation, and Regression tests show that Walkscore,
Population Density, and the proportion of walking (commuters or
residents), all relate to annual pedestrian deaths. The second regression
model seems to have the better fit, exchanging % walking to work for %
walking in general, and utilizing population density and Walkscore. It is
surprising that Population Density has a positive coefficient, however. Our
expectation, especially since density contributes to a higher Walkscore,
was that with higher densities would come fewer pedestrian fatalities. The
reality is that this impact is very slightly positive.