a parallel formulation of the spatial auto-regression model for mining large geo-spatial datasets...

A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL

FOR MINING LARGE GEO-SPATIAL DATASETS

HPDM 2004 Workshop at SIAM Data Mining Conference

Barış M. Kazar, Shashi Shekhar, David J. Lilja, Daniel Boley

Army High Performance Computing and Research Center (AHPCRC)Minnesota Supercomputing Institute (MSI)

Digital Technology Center (DTC)University of Minnesota

04.24.2004

A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets 2

Overview

• Motivation

• Classical and New Data-Mining Techniques

• Problem Definition

• Our Approach

• Experimental Results

• Conclusions and Future Work


Motivation

• Widespread use of spatial databases Mining spatial patterns The 1855 Asiatic Cholera on London [Griffith]

• Fair Landing [NYT, R. Nader] Correlation of bank locations with loan

activity in poor neighborhoods• Retail Outlets [NYT, Walmart, McDonald etc.]

Determining locations of stores by relating

neighborhood maps with customer

databases• Crime Hot Spot Analysis [NYT, NIJ CML]

Explaining clusters of sexual assaults by

locating addresses of sex-offenders• Ecology [Uygar]

Explaining location of bird nests based on structural environmental variables


Key Concept: Neighborhood Matrix (W)

WEST21 )1,(SOUTH 111 j)1,(iEAST 111 )1,(

NORTH 12 ),1(

),(

qjp, ijiqj, p-iq-jp, i ji

qj p,iji

jineighbors

W allows other neighborhood definitions• distance based• 8 and more neighbors

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Space + 4-neighborhood

0100100000000000101001000000000001010010000000000010000100000000100001001000000001001010010000000010010100100000000100100001000000001000010010000000010010100100000000100101001000000001001000010000000010000100000000000100101000000000001001010000000000010010

6th row

Binary W

021002

1000000000003

1031003

10000000000

03103

10031000000000

002100002

1000000003

1000031003

10000000

041004

1041004

1000000

0041004

1041004

100000

00031003

10000310000

00003100003

10031000

0000041004

1041004

100

00000041004

1041004

10

000000031003

1000031

000000002100002

100

00000000031003

10310

000000000031003

1031

0000000000021002

10

6th row

Row-normalized W

Given:• Spatial framework• Attributes


Classical and New Data-Mining Techniques

Classical Linear Regression Low

Spatial Auto-Regression High

Name ModelClassification Accuracy

εx βy

εxβWyy ρ

framework spatialover matrix odneighborho -by- :

parameter n)correlatio-(auto regression-auto spatial the:

nnW

• Solving Spatial Auto-regression Model = 0, = 0 : Least Squares Problem = 0, = 0 : Eigenvalue Problem General case: Computationally expensive

• Maximum Likelihood Estimation• Need parallel implementation to scale up

β

εε

SSEnn

L 2

)ln(

2

)2ln(ln)ln(

2WI


Related Work & Our Contributions

• Related work: Li, 1996Limitations: Solved 1-D problem

• Our ContributionsParallel solution for 2-D problems Portable software

Fortran 77 An Application of Hybrid Parallelism

» MPI messaging system » Compiler directives of OpenMP


A Serial Solution

B• Golden Section

Search• Calculate ML

Function

ACompute

Eigenvalues

C

Least SquaresEigenvalues of W

of range,,,, nWyx

2ˆ,ˆ,ˆ ̂Wyx ,,,n

• Compute Eigenvalues (Stage A) Produces dense W neighborhood matrix, Forms synthetic data y Makes W symmetric Householder transformation

Convert dense symmetric matrix to tri-diagonal matrix QL Transformation

Compute all eigenvalues of tri-diagonal matrix


Serial Response Times (sec)

• Stage A is the bottleneck & Stage B and C contribute very small to response time

0

1000

2000

3000

4000

5000

6000

7000

SGIOrigin

IBM SP IBMRegatta

SGIOrigin

IBM SP IBMRegatta

SGIOrigin

IBM SP IBMRegatta

2500 6400 10000

Problem Sizes on Different Machines

Tim

e (s

ec)

Stage A Stage B Stage C


Problem Definition

Given:• A Sequential solution procedure: “Serial Dense Matrix Approach” for one-dimensional geo-spaces

Find:• Parallel Formulation of Serial Dense Matrix Approach for multi-dimensional geo-spaces

Constraints:• N(0,2I) IID• Reasonably efficient parallel implementation• Parallel Platform • Size of W (large vs. small and dense vs. sparse)

Objective:• Portable & scalable software


Our Approach – Parallel Spatial Auto-Regression

• Function vs. Data PartitioningFunction partitioning: Each processor works on the

same data with different instructionsData partitioning (applied): Each processor works on

different data with the same instructions

• Implementation Platform: Fortran with MPI & OpenMP API’s

• No machine-specific compiler directivesPortabilityHelp software development and technology transfer

• Other Performance TuningStatic terms computed once


021002

1000000000003

1031003

10000000000

03103

10031000000000

002100002

1000000003

1000031003

10000000

041004

1041004

1000000

0041004

1041004

100000

00031003

10000310000

00003100003

10031000

0000041004

1041004

100

00000041004

1041004

10

000000031003

1000031

000000002100002

100

00000000031003

10310

000000000031003

1031

0000000000021002

10

Contiguous16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

P1 P1 P1 P1P2 P2 P2 P2

P3 P3 P3 P3P4 P4 P4 P4

021002

1000000000003

1031003

10000000000

03103

10031000000000

002100002

1000000003

1000031003

10000000

041004

1041004

1000000

0041004

1041004

100000

00031003

10000310000

00003100003

10031000

0000041004

1041004

100

00000041004

1041004

10

000000031003

1000031

000000002100002

100

00000000031003

10310

000000000031003

1031

0000000000021002

10

P1

P2

P3

P4

P1

P2

P3

P4

P1

P2

P3

P4

P1

P2

P3

P4

Round-robin with chunk size 1 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Data Partitioning in a Smaller Scale

• 4 processors are used and chunk size can be determined by the user• W is 16-by-16 and partitioned across processors

P1- (40 vs. 58) P2- (36 vs. 42) P3- (32 vs. 26)

P4- (28 vs. 10)


• A : Contiguous for rectangular loops & round-robin with chunk-size 4

• B : Contiguous• C : Contiguous• The arrows are also synchronization points for parallel solution

A B C• There are synchronization points within the boxes as well

Data Partitioning & Synchronization

B• Golden Section

Search• Calculate ML

Function

ACompute

Eigenvalues

C

Least SquaresEigenvalues of W

of range,,,, nWyx

2ˆ,ˆ,ˆ ̂

Wyx ,,,n


Experimental Design

Factor NameLanguage

Problem Size (n)

Neighborhood Structure

Method

Auto-regression Parameter

Contiguous (B=n/p )

Round-robin w/ B ={1,4,8,16}

Combined (Contiguous+Round-robin)

Dynamic w/ B ={n/p ,1,4,8,16}

Guided w/ B ={1,4,8,16}

MLB Affinity w/ B ={n/p ,1,4,8,16}

Number of Processors

Maximum Likelihood for exact SAM[0,1)

SLB

DLB

Parameter Domain f77 w/ OpenMP & MPI2500,6400 and 10000 observation points2-D w/ 4-neighbors

IBM Regatta w/ 47.5 GB Main Memory; 32 1.3 GHz Power4 architecture processors

Hardware Platform

1,4, and 8

Load-Balancing


Experimental Results – Effect of Load Balancing

Effect of Load-Balancing Techniques on Speedup for Problem Size 10000

0

1

2

3

4

5

6

7

8

1 4 8Number of Processors

Sp

eed

up

mixed1 Static B=8 Dynamic B=8 Affinity B=1 Guided B=16


Experimental Results- Effect of Problem Size

Impact of Problem Size on Speedup Using Affinity Scheduling on 8 Processors

0

1

2

3

4

5

6

7

8

2500 6400 10000Problem Size

Sp

ee

du

p

affinity B=n/p affinity B=1 affinity B=4 affinity B=8 afiinity B=16


Experimental Results- Effect of Chunk Size

Effect of Chunk Size on Speedup Using Static Scheduling on 8 Processors

0

1

2

3

4

5

6

7

8

1 4 8 16 n/pChunk Size

Spee

dup

PS=2500 PS=6400 PS=10000

• Critical value of the chunk size for which the speedup reaches the maximum. • This value is higher for dynamic scheduling to compensate for the scheduling overhead.• The workload is more evenly distributed across processors at the critical chunk size value.

Effect of Chunk Size on Speedup Using Dynamic Scheduling on 8 Processors

0

1

2

3

4

5

6

7

8

1 4 8 16 n/pChunk Size

Sp

eed

up

PS=2500 PS=6400 PS=10000


Experimental Results- Effect of # of Processors

Effect of Number of Processors on Speedup (PS=10000)

0

1

2

3

4

5

6

7

8

mix

ed1

mix

ed2

stat

ic B

=1st

atic

B=4

stat

ic B

=8st

atic

B=1

6st

atic

B=n

/pdy

nam

ic B

=1dy

nam

ic B

=4dy

nam

ic B

=8dy

nam

ic B

=16

dyna

mic

B=n

/paf

finity

B=1

affin

ity B

=4af

finity

B=8

affin

ity B

=16

affin

ity B

=n/p

guid

ed B

=1gu

ided

B=4

guid

ed B

=8gu

ided

B=1

6

Load-Balancing (Scheduling) Techniques

Sp

eed

up

4 8


Summary

• Developed a parallel formulation of spatial auto-regression model

• Estimates maximum likelihood of regular square tessellation 1-D and 2-D planar surface partitionings for location prediction problems

• Used dense eigenvalue computation and hybrid parallel programming


Future Work

1. Understand reasons of inefficiencies– Algebraic cost model for speedup

measurements on different architectures

2. Fine tune implemented parallel formulation

– Consider alternate parallel formulations

3. Parallelize other serial solutions using sparse-matrix techniques

− Chebyshev Polynomial approximation− Markov Chain Monte Carlo Estimator


Acknowledgments & Final Word

• Army High Performance Computing Research Center-AHPCRC• Minnesota Supercomputing Institute - MSI• Digital Technology Center – DTC• Spatial Database Group Members• ARCTiC Labs Group Members• Dr. Sanjay Chawla• Dr. Kelley Pace • Dr. James LeSage

THANK YOU VERY MUCHQuestions?

a parallel formulation of the spatial auto-regression model for mining large geo-spatial datasets...

Documents