parallel computing for urban cellular automata qingfeng. gene. guan 2004-nov-18 geography department...

Parallel Computing for Urban Cellular Automata

Qingfeng. Gene. Guan2004-Nov-18

Geography Department ColloquiumUniv. of California, Santa Barbara

Cellular Automata (CA)

• A classical CA is a set of identical elements, called cells, each one of which is located in a regular, discrete space. Each cell can be associated with a state from a finite set. The model evolves in discrete time steps, changing the states of all its cells according to a transition rule, homogeneously applied at every step. The new state of a certain cell depends on the previous states of a set of cells, which can include the cell itself, and constitutes its neighborhood.

Components & Boundary Terms of CA

• Cells• States• Neighborhood• Transition Rules

• Space

• Time

An Example of 2D CA

< Space & Cells >

= 1

= 0

< States >

< Rule >

if {Sum[State(Neighbor(i))] >= 1 & State(i) = 0}

then

State(i) = 1

< Initial State >

< Neighborhood >

i

An Example of 2D CA (cont.)

t = 0 t = 1 t = 2

t = 3t = 4

Want to See More? ^_^

Here we go …

http://www.collidoscope.com/modernca/welcome.html

CAs in Geographic Research

• Land Cover/Use Change Simulation– Urban Growth

• Wild Fire Simulation

• Flood, Lava & Desert Spread Simulation

• Traffic Flow Simulation

• More and More Coming up…

Anthony Gar-On Yeh, Xia Li. 2003

Example: Online Traffic Flow Simulation

R. Barlovic et al. Online Traffic Simulation with Cellular Automata. 1999

SLEUTH @ UCSBThe urban growth model SLEUTH, uses a modified CA to model the spread of urbanization across a landscape (Keith C. Clarke et al., 1996, 1997). Its name comes from the GIS data layers that are incorporated into the model; Slope, Landuse, Exclusion layer (where growth cannot occur, like the ocean), Urban, Transportation, and Hillshade. (Noah C. Goldstein. 2004)

CA in the SLEUTH

• Coefficients– Dispersion– Breed– Spread– Slope– Road Gravity

• Rules– Spontaneous Growth

Rule– New Spreading

Centers Rule– Edge Growth Rule– Road-Influenced

Growth Rule

For more info. about SLEUTH: http://www.ncgia.ucsb.edu/projects/gig/

Why Parallel?

• Data Intensity– GIS users today have access to an

unprecedented amount of high resolution and high-quality data through scanners, remote sensing devices, GPS receivers, government agencies, social organizations, commercial companies, etc.

– Example: Cell size 50 X 50 m. Cell space 300 X 300 km. Then the CA needs to process 6000 X 6000 cells. What about a space of 1000 X 1000 km? 4 X 108 Cells!! What if higher resolution and bigger space?...

Why Parallel? (cont.)

• Computational Intensity– More Data More Computation– More Complicated Rules More Computation

• SLEUTH: 5 coefficients, 4 rules, self-modification

– Coefficient Calibration More Computation• SLEUTH: 5 coefficients, Range of [0 100], 1015 coefficient

sets

– Monte Carlo Iteration More Computation • SLEUTH: 10 ~ 100 Iterations for each coefficient set are

suggested

– Other Factors…

Why Parallel? (cont.)

• “The model calibration for a medium sized data set and minimal data layers requires about 1200 CPU hours on a typical workstation” (Keith C. Clarke. 2003).

• High-performance computing is required• Solution: Parallelization• Goal: To process more data, more coefficient

sets, more Monte Carlo iterations, in less time

Strategies for Parallelization (1)• Data-oriented

Parallelization– Split the whole dataset into

sub datasets and assign them to multiple processors; these processors deal with sub datasets in parallel

– CA was born to be parallelized with this strategy: Split the whole cell space into sub cell spaces

– Solution for computation intensity from huge cell space

P0 P1 P2 P3

Strategies for Parallelization (2)

• Task-oriented Parallelization– Split the whole task into sub

tasks and assign them to multiple processors; these processors perform sub tasks in parallel

– For the SLEUTH, coefficient set evaluation and Monte Carlo iterations can be parallelized with this strategy

– Solution for computation intensity from complex rules

Coef. Set 0 Coef. Set 1


P0 P1


Monte Carlo 0.0Monte Carlo 0.2

P0 P1 P2

Monte Carlo 0.1 Monte Carlo 1.0Monte Carlo 1.2

P3 P4 P5

Monte Carlo 1.1

Strategies for Parallelization (3)

• Combination of the previous two strategies– Each processor

perform evaluation of a certain coefficient set or a Monte Carlo iteration on a certain sub cell space

Coef. Set 0

Coef. Set 1

P0 P1 P2 P3

Coef. Set 0 Coef. Set 1Coef. Set 0 Coef. Set 0...

Challenge (1)

• Communication Overhead– Definition: Information

flow among processors

– Challenge: How to minimize it? How to make the massage passing more efficient?

– Possible Solution: Ghost Cells

P0 P1

P2 P3

i

Challenge (2)

• Load Balance– Definition: Balance of

Data & Task load among processors

– Challenge: How to deal with sparse grid?

– Possible Solution: Irregular Parsing

Research Questions

• Which parallelization strategy is best for CA model use for calibration, and for forecasting?– Depends on the data amount, CA model, task

dependence, software & hardware, etc

• Can multiple parallelization approaches be implemented and compared?

• Does Ghost Cells method work for geographic CA (e.g. Urban CA)?

• How sparse can a grid be and still benefit from data parsing?

Future Work

• Dependence Analysis of the SLEUTH

• Data & Task Parsing Methods for the SLEUTH

• Communication Optimizing Methods

• Load Balance Optimizing Methods

• Implementation & Comparison

Further Future

• Supercomputer / Cluster?

• Peer-to-Peer Computing?– SLEUTH@Home

• Grid Computing?– GRID SLEUTH

Acknowledgement

• Prof. Keith Clarke, Dept. of Geography, UCSB

• Noah Goldstein, Dept. of Geography, UCSB

• Charles Dietzel, Dept. of Geography, UCSB

• Jeff Hemphill, Dept. of Geography, UCSB

Comments & Questions?

parallel computing for urban cellular automata qingfeng. gene. guan 2004-nov-18 geography department...

Documents

set of cells

cell space

highquality data

d ca space cells states

medium sized data set

minimal data layers

gis data layers

d ca cont