l1 spatial data - uantwerpen

Post on 22-Mar-2022

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Spatial issues in data analysis and model building:

distance, scale and complexity.

Isabelle THOMAS Francqui Chair

March 11th 2015

Spatial analysis

• Visualization Showing interesting patterns (Maps)

• Exploratory Spatial Data Analysis (ESDA) Finding interesting patterns

• Spatial modelling (regression, …) Explaining interesting patterns

Spatial is special

INTRODUCTION Distance Scale Complexity Accidents Conclusions

BAD NEWS

GOOD NEWS

ESDA DESCRIPTION

Spatial STATISTICS

Statistical MAPS

Modeling Spatial statistical

analysis and hypothesis testing

(Spatial) modeling and prediction

LEVEL OF DIFFICULTY

INTRODUCTION Distance Scale Complexity Accidents Conclusions

DISTANCE

DISTANCE Adjacency, interaction, and neighborhoods SCALE MAUP, spatial autocorrelation, ecology fallacy, edge/border effect

Why is distance so important ? (1)

Price of land

Quantity of Land

Towards downtown

Towards the periphery

Q1 Q2 Q3

P1

P2

P3

Distance to CBD

High densities ----------------------------------------------------Low densities

The core of (transport) geography Enters most models, many indices

LOCATION

Absolute Latitude,

longitude; an address

Relative Distance,

directions to other places

Distance

Adjacency

Neighbourhood

Interaction

Why distance so important ? (2)

Introduction DISTANCE Scale Complexity Accidents Conclusions

Presenter
Presentation Notes

B

C

A

E

F

D

Adjacency Distance Interaction Neighboorhood

Adjacency matrix (or adjacency list)

Introduction DISTANCE Scale Complexity Accidents Conclusions

i and j are adjacent - if they share a common boundary - Share = ? - if they are within a specified distance (buffer - neighbourhood) Binary or distance-based weights.

Order of adjacency.

Introduction DISTANCE Scale Complexity Accidents Conclusions

Presenter
Presentation Notes
.

Rook Queen

Brig

gs H

enan

Uni

vers

ity 2

012

9

1st order

2nd order

Introduction DISTANCE Scale Complexity Accidents Conclusions

B

C

A

E

F

D

66

24

41

68

68

Adjacency Distance Interaction Neighboorhood

Introduction DISTANCE Scale Complexity Accidents Conclusions

– dij measures the separation between i and j – (mathematical) definition:

• dij>0 if i≠j (distinction/separation) • dij=0 if i=j (co-location/equivalence)

Diagonal of the adjacency matrix

• dij+djk≥dik (triangle inequality) • dij=dji symmetry (is the graph symmetric ?)

Measuring distance is not simple …

In spatial analysis Objects may not be truly point-like/distinct Triangle inequality may not hold Symmetry condition may not hold

Introduction DISTANCE Scale Complexity Accidents Conclusions

ww

w.s

patia

lana

lysis

onlin

e.co

m

Terrain distances – cross section view

Measuring distance is not simple …

Introduction DISTANCE Scale Complexity Accidents Conclusions

ww

w.s

patia

lana

lysis

onlin

e.co

m

13

NB.- Spherical coordinates – spherical /ellipsoidal computations • Metrics

( ) ( )

2,

2:

coscossinsinsin2 221

jiji

jiij

BAwhere

BARd

λλφφ

φφ

−=

−=

+= −

Measuring distance • lp metrics

p = 1 Manhattan; p = 2 Euclidean; ...

Introduction DISTANCE Scale Complexity Accidents Conclusions

B

C

A

E

F

D

Distance Adjacency Interaction Neighboorhood

Introduction DISTANCE Scale Complexity Accidents Conclusions

ww

w.s

patia

lana

lysis

onlin

e.co

m

Distance decay models – Simple inverse power models

– Trip distribution models

– Statistical modelling

0,})({

≥= ββij

ij

d

zfz

)( ijjijiij dfDOBAT =

Introduction DISTANCE Scale Complexity Accidents Conclusions

? B

C

A

E

F

D

Adjacency Distance Interaction Neighboorhood

Sour

ce :

Ovt

rach

t, 20

14

Introduction DISTANCE Scale Complexity Accidents Conclusions

http

://w

ww

.col

orad

o.ed

u/ge

ogra

phy/

Introduction DISTANCE Scale Complexity Accidents Conclusions

http

://w

ww

.col

orad

o.ed

u/ge

ogra

phy/

Introduction DISTANCE Scale Complexity Accidents Conclusions

j

1 2

4 3

Errors A : d(2,j) < d(i,j) < d(5;j) B : d(1,i) = 0 C : i can be allocated to j while closer to j ’

5 i

Aggregation decreases – data collection costs – modeling costs – computing costs – confidentiality concerns – data statistical

uncertainty (smaller sample deviations for larger samples)

Increases – modeling errors/biases

Distance – agregation & scale

Introduction Distance SCALE Complexity Accidents Conclusions

SCALE

LOCATION

Don’t forget the essence of your problem

SITE

SITUATION

SOCIOECONOMIC ENVIRONMENT

Land, transportation, amenities, …

Labor, materials, energy, …

Capital, subsidies, regulations, …

MACRO (national)

MICRO (local)

MESO (regional)

SCALE

SCALE: cartographically

Large cartographic scale Small cartographic scale

Sour

ce :

Topo

map

vie

wer

. N

GI/

ING

Statistical sectors Communes, provinces, …

Introduction Distance SCALE Complexity Accidents Conclusions

Extent constant, different grain

Increasing extent, grain constant

• Extent: spatial dimension

of an object (or process) observed/analyzed

• Grain (BSU): level of spatial resolution at which an object (or process) is measured/observed.

SCALE: 2 aspects

Source « INS »

Aute

urs :

Lar

ielle

et T

hom

as, 2

014

Land rent

(by sq m) 2013

25

SCALE: Extent

Results obtained at one scale do not necessarily apply at other scales. A pattern may be clustered at one scale but dispersed at another scale

Brig

gs H

enan

Uni

vers

ity 2

012

Population clustered into cities

City populations are dispersed

Scale is always important in spatial analysis!

SCALE: Extent

Introduction Distance SCALE Complexity Accidents Conclusions

1. Patterns are dependent upon the scale of observation 2. The importance of explanatory variables changes with scale. 3. Statistical relationships may change with scale. 4. Patterns are generated by processes acting over various

spatial (and temporal) scales.

No unique solution Nested models, power laws, fractals, networks, …

Why being concerned about scale?

Power laws • Summarize how relationships

change with changes in scale • Often expressed on a log-log

plot. • Y = constant (X)n

• Similar slopes are thought to have similar structuring processes (n = slope)

• Example • Species-area relationships

! However : power laws often lack an explanatory process

• The same pattern appears across all scales. It is scale invariant.

• The relationship between size of box and pattern in it is constant.

• Fractals follow their own power law relating how number of boxes needed to cover a shape change in relation to their size.

Fractals

Introduction Distance SCALE Complexity Accidents Conclusions

• Can represent relationships at a variety of scales at once.

• Structural properties of networks provide means of understanding how they work. – Nodes and links – Degree centrality and

betweeness – Weak versus strong links – Directional versus non-

directional graphs

Networks

Introduction Distance SCALE Complexity Accidents Conclusions

1. Modifiable Areal Unit Problem (MAUP) 2. Ecology fallacy, 3. Edge/border effect 4. Spatial autocorrelation, (…)

Fallacies of scale

Introduction Distance SCALE Complexity Accidents Conclusions

1. Modifiable Areal Unit Problem (MAUP)

Introduction Distance SCALE Complexity Accidents Conclusions

Ecological fallacy: making claims about local-scale phenomena based on broad-scale observations Individualistic fallacy: making claims about broad scale phenomena based on observations conducted at small, local scales

2. Ecological fallacy

Do not generalise conclusions at other scales

Points close to the border are closer to locations out of the studied area. Arises when an artificial boundary is imposed on a study, often just to keep it manageable. Biases > nearest-neighbor distances > (model results) ? How to consider “the rest of the world”.

3. Edge/Border effects Solution:

1)Biased parameter estimates 2)Data redundancy (affecting the calculation of confidence intervals) 3)Moran and Geary

4. Spatial autocorrelation (1)

ww

w.s

patia

lana

lysis

onlin

e.co

m

Coefficient – Coordinate (x,y,Z) – Spatial weights matrix (binary or other), W={wij} – Coefficient formulation – desirable properties

• Reflects co-variation patterns • Reflects adjacency patterns via weights matrix • Normalised for absolute cell values • Normalised for data variation • Adjusts for number of included cells in totals

4. Spatial autocorrelation (2)

Introduction Distance SCALE Complexity Accidents Conclusions

ww

w.spatialanalysisonline.com

• Moran’s I

• Modification for point data • Replace weights matrix with distance bands, width h • Pre-normalise z values by subtracting means • Count number of other points in each band, N(h)

∑∑∑∑∑

=−

−−

=i j

ij

ii

i jjiij

nwpzz

zzzzw

pI / where,

)(

))((1

2

∑∑∑

=

ii

i jji

z

zz

hNhI2

)()(

4. Spatial autocorrelation (3)

Introduction Distance SCALE Complexity Accidents Conclusions

Extending SA concepts – Distance formula weights vs bands – Lattice models with more complex

neighbourhoods and lag models (GeoDa) – Disaggregation of SA index computations (row-

wise) with/without row standardisation (LISA) – Significance testing

• Normal model • Randomisation models • Bonferroni/other corrections

4. Spatial autocorrelation (4)

Introduction Distance SCALE Complexity Accidents Conclusions

ww

w.s

patia

lana

lysis

onlin

e.co

m

Moran I Correlogram

Source data points Lag distance bands, h Correlogram

4. Spatial autocorrelation (5)

Introduction Distance SCALE Complexity Accidents Conclusions

• Underlying socio-economic process has led to clustered distribution of variable values – Grouping, Spatial interaction – Diffusion, Dispersal – Spatial hierarchies

• Mis-match betw. process and spatial units

– Counties vs retail trade zones – Census block groups vs neighborhood networks

4. Spatial autocorrelation (6) Causes of spatial dependence / Interpretation

What is Spatial autocorrelation D. Griffith, 1992 – L’Esp. Géo.

Explore the data

Fit an OLS

model

Perform diagnosis

Run adapted model

(ex GWR)

Compare models

EDA ESDA

Global autocorrelation Local autocorrelation

Global model Local model

RESULTS DECISION

Hypo theses

Introduction Distance SCALE COMPLEXITY Accidents Conclusions

Start with OLS and look for

– Positive spatial autocorrelation > dependence between samples exists

– Datasets often non-Normal >> transformations may be required (Log, Box-Cox, Logistic)

– Samples are often clustered >> spatial declustering may be required

– Heteroskedasticity is common (iid) – Spatial coordinates (x,y) may form part of the

modelling process

ww

w.s

patia

lana

lysis

onlin

e.co

m

Introduction Distance SCALE Complexity Accidents Conclusions

Type of spatial effect > Remedies – Spatial heterogeneity (Koenker-Bassett test)

• Include covariate which accounts for heterogeneity? • Split region?

– Spatial autocorrelation (Lagrange Multiplier tests) • Identify missing variables? • Explore effects of spatially-lagged independent variables? • Use appropriate spatial regression model?

Regression models

ww

w.s

patia

lana

lysis

onlin

e.co

m

Introduction Distance SCALE COMPLEXITY Accidents Conclusions

• Identify the source (LM tests will help) – Regression residuals (LM-Error)

• Mismatch of process and spatial units => systematic errors, correlated across spatial units

– Dependent variable (LM-Lag) • Underlying socio-economic process has led to clustered

distribution of variable values => influence of neighboring values on unit values

Regression models

ww

w.s

patia

lana

lysis

onlin

e.co

m

LARGE number of solutions : Spatial autoregressive process (SAR) Spatial moving average process (SMA), …

COMPLEXITY or COMPLICATION ?

Introduction Distance Scale COMPLEXITY Accidents Conclusions

• Algorithmic complexity • Deterministic complexity • Aggregate complexity Key generic properties 1. Nonlinear relationships 2. Techniques such as artificial intelligence 3. Emerges form relatively simple interactions System change and evolve

Complexity is hard to define

M

anso

n, 2

001

- R

. Mar

tin a

nd S

unle

y.

Property Attributes

Has a distributed nature & representation Multiscalar.

Openness Open system

Non-linear dynamics Path dependence.

Limited functional decomposability

Emergence and self-organisation Emergence

Adaptive behaviour and adaptation Self organization

Non deterministic and non tractability Stochastic

Vocabulary about complexity

M

anso

n, 2

001

- R

. Mar

int a

nd S

unle

y.

SYSTEM ANALYSIS

MIT, Jay Forrester (6’), Bertalanffy (67) General system

theorySystem’s autonomy

SELFORGANIZATION Prigogine, Haken (1970-80)

Open systems, dissipative structures, impredictible effects of

non linear micro-interactions on system’s macro structure and dynamics, path dependence

(irreversibility)

COMPLEX SYSTEMS Santa Fe Institute,

ISI, ECSS (1990-2000)

Emerging properties

Models: Multi-Agents-Systems

Models: differential equations

Urban systems are complex systems • Urban systems are produced by social interactions (conveying

information), according to their range in space and duration in time

• Non-linear interaction occur at micro, meso or macro levels, and between levels

• Emergence of collective properties within cities: • Hierarchical organisation (« cities as systems within systems of cities »

Reynaud, 1841, Berry, 1964, Pred, 1977) • Urban « memory » (dynamic path dependence) as a constraint on

urban dynamics at both levels

PLACE(S)(Environment)

Road(s) PEOPLE (Roadusers):

(x, y, t)

t-1

t

t+1

VEHICLE(S)

INTERACTIONS

From facts … to geography

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Multi-level problem

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Explore the data

Fit an OLS

model

Perform diagnosis

Run adapted model

(ex GWR)

Compare models

EDA ESDA

Global autocorrelation Local autocorrelation

Global model Local model

Step 1: EDA Select variable and describe

Univariate

Bi- and multi- variate

Visualizations

Tables, Charts, Plots, autocorr, hot spot

Maps

Step 2 : ESDA

Test spatial homogeneity

Spatial weights

Global & Local spatial autocorrelation

• Point pattern analysis Describing a point pattern. Black spots, black zones

- Density-based point pattern measures - Distance-based point pattern measures

Assessing point patterns statistically • Aggregation - Segments of road - Communes (stat sectors) • Explanation/prediction - Measuring and modeling numbers/risk

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Pinpoint location (point) Black spot Black road segment (line) Black « region » (polygon) Multi- scale, dimensional, disciplinary, causal analysis. Necessity: to isolate, to control for in order to avoid badly specified models.

Describe / Understand / Explain / predict + ACT (Engineering, Enforcement, Education, Environment)

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Poisson or not ?

• Poisson > Binomial • Aggregation effects • Length of segments

Sour

ce :

Thom

as, 1

996

5.1

Poin

t pat

tern

ana

lyse

s

Sour

ce :

Flah

aut,

2002

Road accidents N29 Charleroi-Jodoigne

Moran for black segments

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

5. A

CC

IDEN

TS D

E LA

RO

UTE

5.

1 Po

int p

atte

rn a

naly

ses

Sour

ce: E

ckha

rt, 2

002

5.1

Poin

t pat

tern

ana

lyse

s

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Kernel

Sour

ce: S

teen

berg

hen,

Def

ays,

Tho

mas

, Fla

haut

, 201

0

5.1

Poin

t pat

tern

ana

lyse

s

Mechelen

Sour

ce: S

teen

berg

hen,

Def

ays,

Tho

mas

, Fla

haut

, 201

0

5.1

Poin

t pat

tern

ana

lyse

s

Infrastructure &

Environnement

Yi = 1 if hm belongs to a « black segment ».

Yi = 0 otherwise

Xi

Characteristics of the road - Usage - Physical properties - Environment (landuse, …)

(Official data; Numerical Digital Terrain Model; IGN maps)

Logistic regression 5.2

Mod

el fo

r i =

hec

otm

ers

Sour

ce :

Flah

aut,

2004

Introduction Distance Scale Complexity ACCIDENTS Conclusions

N 0 250m

5. A

CC

IDEN

TS D

E LA

RO

UTE

5.

2 M

odel

for i

= h

ecot

mer

s

Sour

ce :

Flah

aut,

2004

5. A

CC

IDEN

TS D

E LA

RO

UTE

Sour

ce :

Flah

aut,

2004

5.2

Mod

el fo

r i =

hec

otm

ers

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

5.3

Mod

el fo

r i =

com

mun

es

Objective : explain variations in Y Controlling spatial biases

5.3

Mod

el fo

r i =

com

mun

es

EXPLORATORY

Identify potential explanatory factors

Statistical tools: • Graphics, (basic statistics) • Cluster analyses, (PCA) • Correlations (x,y)

STATISTICAL MODELLING

Relative importance of variables?

Statistical tools • Statistical models • Corrections for

multicollinearity & spatial effects

2 steps

Factor X ?

Factor 1

Factor 2

?

town

village

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

% cycling

Distance (km)

H1

H2

H3

H5

H8

10 km

• Commuting distances (< 10 km) • Town size: regional towns > large towns • Regional differences (culture + …)

Exploratory step

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

5.3

Mod

el fo

r i =

com

mun

es

5. A

CC

IDEN

TS D

E LA

RO

UTE

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

Exploratory step 5.

3 M

odel

for i

= c

omm

unes

Unsatisfaction of cycleways: –0.82

Slopes: –0.77 Bad health: – 0.58

ρxy = 1

(correlation)

Active people < 25 years: 0.54

Accident risk: – 0.32

Job density: 0.38

No child, town size: 0.23

ρxy = 0

ρxy = –1

Commuting distances (km)

Average slopes (d°)

Commuting distances: – 0.54

POLICY-RELATED FACTORS

ENVIRONMENTAL FACTORS

INDIVIDUAL FACTORS

- Income - Education - Gender - Age - Car availability - Young childrens/household

Socio-economic data (NIS)

- Subjective health

Health data (NIS)

- Slopes (d°)

Physical data (UCL)

- Air pollution (PM10)

Environmental data (IRCEL-CELINE)

- Accident risk: f (number of accidents, travel time)

Accident data (NIS)

- Land-use (e.g. urban) - City size - Job and pop. densities

Land-use data (UCL)

- Satisfaction of cycle paths - Traffic volume - Commuting distance (km)

Trip/local characteristics

BICYCLE USE

Scale : communes (INS 5)

Vandenbulcke et al Transportation Research Part A (2011)

SPATIAL AUTOREGRESSIVE

MODEL + REGIMES

Uncorrelated X

"White correction »

OLS (Ordinary-Least Squares )

Spatial autocorrelation (LM tests)

Structural instability (Chow tests)

Multicollinearity (VIF, …)

Heteroskedasticity (BP tests)

Spatial autoregressive model (spatial lag)

Inclusion of spatial regimes (ESDA)

111111 εβρ ++= XyWy

222222 εβρ ++= XyWy

εβρ ++= XWyy(Queenmatrix)

εβ += Xy5.

3 M

odel

for i

= c

omm

unes

Presenter
Presentation Notes

OLS Model (n = 589)

Italics: ln(x+1)

Y = % commuter cyclists in commune i

Estimation OLS (y)

Intercept 6,4124****

Median income 0,0030

Active men 0,0472****

Age 2 (45-54 years) -0,0460****

Young children -0,0567****

Cycleways unsatisfaction -0,0127****

Commuting distance -0,0114***

Air quality 0,0141****

City size -0,0954****

Bad health -0,0521****

Accident risk -0,1673**

Traffic volume 2 (municipal network) -0,9216****

Age 3 (> 54 years) -0,2054*

Education 3 (university degree) -0,4988****

Slopes -0,4873****

R-squared (R²) 0,879

Log Likelihood -102,43

Moran's I of residuals 0,34 (0,00)

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

Estimation OLS (y) ML (y)

Intercept 6,4124**** 3,2698****

Median income 0,0030 0,00852

Active men 0,0472**** 0,01673**

Age 2 (45-54 years) -0,0460**** -0,02505***

Young children -0,0567**** -0,0218****

Cycleways unsatisfaction -0,0127**** -0,0049****

Commuting distance -0,0114*** -0,00652**

Air quality 0,0141**** 0,00405

City size -0,0954**** -0,08747****

Bad health -0,0521**** -0,01889****

Accident risk -0,1673** -0,14495***

Traffic volume 2 (municipal network) -0,9216**** -0,46952****

Age 3 (> 54 years) -0,2054* -0,14503*

Education 3 (university degree) -0,4988**** -0,23034***

Slopes -0,4873**** -0,17630****

Lag coefficient (ρ) - 0,6015****

R-squared (R²) 0,879 -

Log Likelihood -102,43 33,68

Moran's I of residuals 0,34 (0,00) 0,01 (0,45)

Y = % commuter cyclists in commune i

OLS

LAG

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

SAR Model (LAG)

LAG

Residuals

OLS

Simpson’s paradox 5.

3 M

odel

for i

= c

omm

unes

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Spatial LAG model + Regimes N-S

North South

Intercept 2,3084* 4,30951****

Median income 0,0311* -0,0027

Active men 0,0296** 0,0008

Age 2 (45-54 years) -0,0417** -0,0205***

Young children -0,0365*** -0,0247***

Cycleways unsatisfaction -0,0052*** -0,0045***

Commuting distance -0,0165*** -0,0047*

Air quality 0,01384**** -0,0054

City size -0,11459**** -0,03615****

Bad health -0,0098 -0,0146**

Accident risk -0,76319**** -0,14892****

Traffic volume 2 (municipal network) -0,2357 -0,4521**

Age 3 (> 54 years) -0,1074 -0,0680

Education 3 (university degree) -0,0968 -0,3132***

Slopes -0,1931** -0,19718****

Lag coefficient (ρ) 0,5362****

N 589 (NNorth = 308; NSouth = 281)

Log Likelihood 93,923

Y = % commuter cyclists in commune i

North = Flanders South = Wallonia & Brussels

Sour

ce :

Vand

enbu

lcke

et a

l, 20

11

Main results

– Demographic factors: e.g. gender, children – Socio-economic: e.g. education – Environmental & policy-related factors, e.g.:

• Dissatisfaction with cycle facilities • Town size • Accident risk • Traffic volume

5.3

Mod

el fo

r i =

com

mun

es

Introduction Distance Scale Complexity ACCIDENTS Conclusions

location 2 > location 1

Spatial factors?

Importance of space/location

Network location 1 Network location 2

Bicycle traffic =

? ?

? accident

street network

5.4

Mod

el fo

r i =

add

ress

es

• Binary Yi = 0,1 logistic specification

• Corrections for – Multicollinearity – Heteroskedasticity – Residual spatial autocorrelation

omitted variables? spatial models

• Spatial models (Bayesian framework) – ICAR model… but fit not improved – Hierarchical auto-logistic model

5. A

CC

IDEN

TS D

E LA

RO

UTE

5.

4 M

odel

for i

= a

ddre

sses

Introduction Distance Scale Complexity ACCIDENTS Conclusions

Cases = accidents + Controls = generated absences yi = (0,1)

Regression methods (e.g. logistic models) Advantage: estimation of risk, reduced statistical bias Issues: no vehicle & human factors, selection of controls

Models based on case-controls?

Methodology

Regression methods (e.g. multinomial logit models) Issues: over-/under-dispersion, underreporting, etc.

Regression methods (e.g. logistic models) Main issue: bias in the selection of road trajectories

Case-control

strategy

Transportation (gravity-based

models)

Epidemiology (case-control

studies)

Ecology (generation of

controls)

Models based on surveys, road trajectories

Models based on accident-only data

Presenter
Presentation Notes
The first category of models are those that are based on …

Data collection

• Accident risk = time-consuming process – Accidents (cases) to be geocoded/located

– ‘Absences’ (controls) to be generated • … but no rigorous sampling method tricky and questionable results!

– Road network exclude ‘unbikeable’ links

– Risk factors to be collected…

• Software requirements: GIS 4.4

Mod

el fo

r i =

add

ress

es

Introduction Distance Scale Complexity ACCIDENTS Conclusions

• Controls = locations without any accident (officially) supposed to be safe

• Generation of controls = random sampling of points along the road network, BUT:

Proportional to bicycle traffic (stratified sampling) Exclude ‘black zones’ (hot spots of accidents) from the

bikeable network

Black zones

Data collection: controls and absences

1) Negative exponential function

2) 500 impedance functions 3) No edge effect

Stratified random sampling

Potential bicycle traffic

111111

Black spots (network kernel densities)

Sa

mp

ling

inte

nsi

ty

Sa

mp

ling

re

gio

n

111111

Ncontrols = 4*Naccidents

Data collection: risk factors Infrastructure factors • Cycling facilities & contraflow cycling • Discontinuities • Parking areas & garages • Bridge & funnels • Crossroads & complexity • Tram railways • Traffic-calming areas • Major roads • Proximity city centre • Distance to specific points of interest (e.g. schools, bus stops, etc.)

Traffic conditions • Cars • Trucks/lorries & buses • Vans

Environmental factors • Gradients • Green blocks (parks, etc.)

5. A

CC

IDEN

TS D

E LA

RO

UTE

4.

4 M

odel

for i

= a

ddre

sses

• Advantage of GIS: combination of several datasets

• Accidents/controls – ‘Attached’ variables – ‘Crossings’

Data collection: risk factors

DATASET

Results: Modelling process

DEPENDENT VARIABLE (BINARY) Accident data (geocoded)

Controls/absences

INDEPENDENT VARIABLES (RISK FACTORS)

Infrastructure factors

Traffic conditions

Environment (physical)

MODELLING PROCESS

FINAL MODEL

Choice of the specification

Convergence diagnostics

Corrections for spatial effects

PREDICTIONS

GIS

Results: robust

Results: Predictions for a trajectory

Schuman’s roundabout

Tram railways

High traffic

volume

Exit High traffic

volume

Succession of crossroads on a major road (Wetstraat/Rue de la Loi) + segregated cycling facility

End of a separated cycling facility at

the crossroad Residential ward

Residential ward + contraflow

Take home message

• Location(s) and distance (s) • Scale : independance of scales; nested. • COMPLEXITY of spatial processes • UNCERTAINTY

Introduction Distance Scale Complexity Accidents CONCLUSIONS

Spatial statistics Large data sets Spatial autocorrelation Scales Border/edge effects MAUP (scale + zoning) Heterogeneity …

SPACE BIASES

Introduction Distance Scale Complexity Accidents CONCLUSIONS

Readings

Data analysis • Fotheringham A., Brunsdon C. &Charlton M. (2000) Quantitative Geography Perspectives on Spatial Data Analysis, London, SAGE • Fotheringham A, C Brunsdon &M Charlton (2002) Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Chichester. • Bailey, T., & A. Gatrell. 1995. Interactive spatial data analysis. Essex, UK: Longman. • www.spatialanalysisonline.com Road accidents in Belgium • Thomas I. (1996), Spatial Data Aggregation. Exploratory Analysis of Road Accidents. AAP, 28:2, 251-264 • SteenberghenT. et al. (2004) Intra-urban location of road accidents blackzones: a Belgian example. IJGIS: 18,2, 169-181. • Vandenbulcke G., Thomas I., IntPanis L. (2014), Predicting cycling accident risk in Brussels: an innovative spatial case-control approach. AAP, 62, 341-357 • Vandenbulcke G.,. et al. (2011) Bicycle commuting in Belgium: Spatial determinants and re-cycling strategies, TR – A 45 118–137

Your exercice – 10 pages. Take your own data set (If you haven’t : go to Census11) and « PLAY » with them. Get 3 variables : Y (your choice) + 1 X « explanatory » + a measure of distance 1. Define/describe them very well; justify the scale (extent and grain) and its

limitations 2. EDA and ESDA + Statistical map of the 3 variables. Compute correlations between variables for several extents and/or 2 levels of aggregation and/or 2 subsets. 3. Compute simple OLS and map residuals (compute spatial autocorrelation) for both levels of aggregation. 4. If possible enhance regression by adopting other method f.i. correct for spatial autocorrelation. 5. Critical and strong conclusion (incl. potentials, challenges, …)

top related