parallel computing for urban cellular automata qingfeng. gene. guan 2004-nov-18 geography department...
TRANSCRIPT
Parallel Computing for Urban Cellular Automata
Qingfeng. Gene. Guan2004-Nov-18
Geography Department ColloquiumUniv. of California, Santa Barbara
Cellular Automata (CA)
• A classical CA is a set of identical elements, called cells, each one of which is located in a regular, discrete space. Each cell can be associated with a state from a finite set. The model evolves in discrete time steps, changing the states of all its cells according to a transition rule, homogeneously applied at every step. The new state of a certain cell depends on the previous states of a set of cells, which can include the cell itself, and constitutes its neighborhood.
Components & Boundary Terms of CA
• Cells• States• Neighborhood• Transition Rules
• Space
• Time
An Example of 2D CA
< Space & Cells >
= 1
= 0
< States >
< Rule >
if {Sum[State(Neighbor(i))] >= 1 & State(i) = 0}
then
State(i) = 1
< Initial State >
< Neighborhood >
i
An Example of 2D CA (cont.)
t = 0 t = 1 t = 2
t = 3t = 4
CAs in Geographic Research
• Land Cover/Use Change Simulation– Urban Growth
• Wild Fire Simulation
• Flood, Lava & Desert Spread Simulation
• Traffic Flow Simulation
• More and More Coming up…
Anthony Gar-On Yeh, Xia Li. 2003
Example: Online Traffic Flow Simulation
R. Barlovic et al. Online Traffic Simulation with Cellular Automata. 1999
SLEUTH @ UCSBThe urban growth model SLEUTH, uses a modified CA to model the spread of urbanization across a landscape (Keith C. Clarke et al., 1996, 1997). Its name comes from the GIS data layers that are incorporated into the model; Slope, Landuse, Exclusion layer (where growth cannot occur, like the ocean), Urban, Transportation, and Hillshade. (Noah C. Goldstein. 2004)
CA in the SLEUTH
• Coefficients– Dispersion– Breed– Spread– Slope– Road Gravity
• Rules– Spontaneous Growth
Rule– New Spreading
Centers Rule– Edge Growth Rule– Road-Influenced
Growth Rule
For more info. about SLEUTH: http://www.ncgia.ucsb.edu/projects/gig/
Why Parallel?
• Data Intensity– GIS users today have access to an
unprecedented amount of high resolution and high-quality data through scanners, remote sensing devices, GPS receivers, government agencies, social organizations, commercial companies, etc.
– Example: Cell size 50 X 50 m. Cell space 300 X 300 km. Then the CA needs to process 6000 X 6000 cells. What about a space of 1000 X 1000 km? 4 X 108 Cells!! What if higher resolution and bigger space?...
Why Parallel? (cont.)
• Computational Intensity– More Data More Computation– More Complicated Rules More Computation
• SLEUTH: 5 coefficients, 4 rules, self-modification
– Coefficient Calibration More Computation• SLEUTH: 5 coefficients, Range of [0 100], 1015 coefficient
sets
– Monte Carlo Iteration More Computation • SLEUTH: 10 ~ 100 Iterations for each coefficient set are
suggested
– Other Factors…
Why Parallel? (cont.)
• “The model calibration for a medium sized data set and minimal data layers requires about 1200 CPU hours on a typical workstation” (Keith C. Clarke. 2003).
• High-performance computing is required• Solution: Parallelization• Goal: To process more data, more coefficient
sets, more Monte Carlo iterations, in less time
Strategies for Parallelization (1)• Data-oriented
Parallelization– Split the whole dataset into
sub datasets and assign them to multiple processors; these processors deal with sub datasets in parallel
– CA was born to be parallelized with this strategy: Split the whole cell space into sub cell spaces
– Solution for computation intensity from huge cell space
P0 P1 P2 P3
Strategies for Parallelization (2)
• Task-oriented Parallelization– Split the whole task into sub
tasks and assign them to multiple processors; these processors perform sub tasks in parallel
– For the SLEUTH, coefficient set evaluation and Monte Carlo iterations can be parallelized with this strategy
– Solution for computation intensity from complex rules
Coef. Set 0 Coef. Set 1
Coef. Set 0 Coef. Set 1
P0 P1
Coef. Set 0 Coef. Set 1
Monte Carlo 0.0Monte Carlo 0.2
P0 P1 P2
Monte Carlo 0.1 Monte Carlo 1.0Monte Carlo 1.2
P3 P4 P5
Monte Carlo 1.1
Strategies for Parallelization (3)
• Combination of the previous two strategies– Each processor
perform evaluation of a certain coefficient set or a Monte Carlo iteration on a certain sub cell space
Coef. Set 0
Coef. Set 1
P0 P1 P2 P3
Coef. Set 0 Coef. Set 1Coef. Set 0 Coef. Set 0...
Challenge (1)
• Communication Overhead– Definition: Information
flow among processors
– Challenge: How to minimize it? How to make the massage passing more efficient?
– Possible Solution: Ghost Cells
P0 P1
P2 P3
i
Challenge (2)
• Load Balance– Definition: Balance of
Data & Task load among processors
– Challenge: How to deal with sparse grid?
– Possible Solution: Irregular Parsing
Research Questions
• Which parallelization strategy is best for CA model use for calibration, and for forecasting?– Depends on the data amount, CA model, task
dependence, software & hardware, etc
• Can multiple parallelization approaches be implemented and compared?
• Does Ghost Cells method work for geographic CA (e.g. Urban CA)?
• How sparse can a grid be and still benefit from data parsing?
Future Work
• Dependence Analysis of the SLEUTH
• Data & Task Parsing Methods for the SLEUTH
• Communication Optimizing Methods
• Load Balance Optimizing Methods
• Implementation & Comparison
Further Future
• Supercomputer / Cluster?
• Peer-to-Peer Computing?– SLEUTH@Home
• Grid Computing?– GRID SLEUTH
Acknowledgement
• Prof. Keith Clarke, Dept. of Geography, UCSB
• Noah Goldstein, Dept. of Geography, UCSB
• Charles Dietzel, Dept. of Geography, UCSB
• Jeff Hemphill, Dept. of Geography, UCSB
Comments & Questions?