ahpcrc spatial data-mining tutorial on scalable parallel formulations of spatial auto-regression...

AHPCRCSPATIAL DATA-MINING TUTORIAL

onScalable Parallel Formulations of

Spatial Auto-Regression (SAR) Models for Mining Regular Grid

Geospatial Data Shashi Shekhar, Barış M. Kazar, David J. Lilja

EECS Department @ University of Minnesota Army High Performance Computing Research Center (AHPCRC)

Minnesota Supercomputing Institute (MSI)

05.14.2003

05.14.2003 AHPCRC Spatial Data-Mining Tutorial 2

Outline

• Motivation for Parallel SAR Models • Background on Spatial Auto-regression

Model• Our Contributions • Problem Definition & Hypothesis• Introduction to the SAR Software• Experimental Design• Related Work• Conclusions and Future Work


Motivation for Parallel SAR Models

• Linear regression models make the assumption of independent identical distribution (a.k.a. iid) about learning data samples.

• Therefore, low prediction accuracy occurs

• SAR model = generalization of linear regression model with an auto-correlation term

• Incorporating the auto-correlation term: – Results in better prediction accuracy– However, computational complexity increases due to the

logarithm-determinant (a.k.a. Jacobian) term of the maximum likelihood estimation procedure

– Parallel processing can reduce execution time


Our Contributions

• This study is the first study that offers the only available parallel SAR formulation and evaluates its scalability

• All of the eigenvalues of any type of dense neighborhood (square) matrix can be computed in parallel

• Scalable parallel formulations of spatial auto-regression (SAR) models for 1-D and 2-D location prediction problems for planar surface partitionings using the eigenvalue computation

• Hand-parallelized EISPACK, pre-parallelized LAPACK-based NAG linear algebra libraries and shared-memory parallel programming, i.e. OpenMP are used


Background on Spatial Autoregression Model• Mixed Regressive Spatial Auto-regressive (SAR) Model

where:y : n-by-1 vector of observations on dependent variable : spatial autoregression parameter (coefficient)W : n-by-n matrix of spatial weights

(i.e. contiguity, neighborhood matrix) X : n-by-k matrix of observations on the explanatory variables : k-by-1 vector of regression coefficientsI : n-by-n Identity matrix : n-by-1 vector of unobservable error term ~ N(0, 2I) : common variance of error term• When = 0, the SAR model collapses to the classical regression

model

εXWyy


Background on SAR Model –Cont’d

If x = 0 and W2 = 0, then first-order spatial autoregressive model is obtained as follows:

y = W1 y +

~ N(0, 2I) If W2= 0, then mixed regressive spatial autoregressive

model (a.k.a. spatial lag model) is obtained as follows:y = W1 y + x +

~ N(0, 2I) If W1= 0, then regression model with spatial autocorrelation

in the disturbances (a.k.a. spatial error model) is obtained as follows:

y = x + u

u = W2 u +

~ N(0, 2I)


Problem DefinitionGiven:

• A Sequential solution procedure: “Serial Dense Matrix Approach”

Same as Prof. Bin Li’s method in Fortran 77

Find:• Faster serial formulation called as “Serial Sparse Matrix Approach”• New Parallel Formulation of Serial Dense Matrix Approach different from Prof. Bin Li’s method in CM-fortran

Constraints: N(0,2I) IID• Rastor Data (resulting in binary neighborhood,contiguity matrix W)• Parallel Platform (HW: Origin 3800, IBM SP, IBM Regatta, Cray X1; SW: Fortran, OpenMP)• Size of W (big vs. small and dense vs. sparse)

Objective:• Maximize speedup (Tserial/Tparallel)• Scalability- Better than (Li,1996)’s formulation


Hypotheses– There are a number of sequential algorithms computing SAR

model, most of which are based on the estimation of maximum likelihood method that solves for the spatial autoregression parameter () and regression coefficients ().

– As the problem size gets bigger, the sequential methods are incapable of solving this problem due to – extensive number of computations and – large memory requirement.

– The new parallel formulation proposed in this study will outperform the previous parallel implementation in terms of:

– Speedup (S), – Scalability, – Problem size (PS) and, – Memory requirement.


Serial SAR Model Solution

2

1exp)2()( 2

12

εεε Tn

N

I & 2|||| WIyε dd

||))(()( yεxyWI ddNyL

])[(])[(2

1exp)2()(

2

22

xyWIxyWIWIT

n

yL

• Starting with normal density function:

• The maximum likelihood (ML) function:

• The explicit form of maximum likelihood (ML) function:


Serial SAR Model Solution - (Cont’d)

])[(])[(2

1

2

)ln(

2

)2ln(ln)ln(

2

2

xyWIxyWIWI TnnL

nTTTT

TT

/)]()([)(ˆ )()(ˆ

12

1

yWIxxxxIWIyyWIxxx

))]()([])([)((2

1

1)1ln(MIN

1

)ln(1||ln 1

)(1||

11

21||

logarithm theTaking

yWIxxxxIxxxxIWIy

WIWI

TTTTTTTn

i i

n

i iρλρ

n

i iρλρ

• The logarithm of the maximum likelihood function is called log-likelihood function

• The ML estimates of the SAR parameters:

• The function to optimize:


System Diagram

B

Golden Section Search to find that

minimizes ML function

ACompute

Eigenvalues

Pre-processingStep

CCompute and given the best estimate using least squares

Calculate the ML function

Eigenvalues of W

n,,,, εWyx

of range

2ˆ,ˆ,ˆ

The Symmetric Eigenvalue-equivalent

Neighborhood Matrix

W~


Introduction to SAR Software

• Following slides will describe:– how we implemented the SAR model solution – execution trace

• Matlab is good for small size problems– Memory is not enough– Execution time is too long

• Compiler language needed for larger problems such as Fortran– Up to 80GB can be used on supercomputers– Execution time decreased considerably due to

parallel processing


1. Pre-processing Step• It consists of four sub-steps

– Forming Epsilon () Vector– Form W, the row-normalized Neighborhood Matrix– Symmetrize W– Form y & x Vectors

Form TrainingData Set y

Form W

Symmetrization of W

To box B

To box A

of range,,,,, nWεyx

of range,,,,, nWεyx


1.0 Form Epsilon () Vector

• Input: p=4; q=4; n=pq=16;• Output: a 16-by-1 column vector (epsilon)

• The term i.e. the 16-by-1 vector of unobservable error term ~ N(0, 2I) in the SAR equation

y = Wy + x + The elements are IID with zero mean and unit

standard deviation normal random numbers• Prof. Isaku Wada’s normal random number

generator written in C is used, which is much better than Matlab’s nrand function

T=[0.9914 -1.2780 -0.4976 1.2271 0.4740 -0.0700 0.0834 -0.8789 0.3378 -1.5901 0.0638 -0.2717 2.3235 -0.2973 -0.1964 0.1416]


1.1 Form W, the Neighborhood Matrix

• Input: p=4; q=4; n=pq=16; neighborhood type (4- Neighbors)

• Output: The binary (non-row-normalized) 16-by-16 C matrix; the row-sum in a 16-by-1 column vector D_onehalf; the row-normalized 16-by-16 neighborhood matrix W

• The neighborhood matrix, W is formed by using the following neighborhood relationship ((i.j) is the current pixel):

• The solution can handle more cases such as:– 1-D 2-neighbors– 2-D 8,16,24 etc. neighbors

WEST21 )1,(SOUTH 111 j)1,(iEAST 111 )1,(

NORTH 12 ),1(

),(

qjp, ijiqj, p-iq-jp, i ji

qj p,iji

jineighbors


1.1 Matrices for 4-by-4 Regular Grid Space

(a) (b) (c)

• (a) The spatial framework which is p-by-q (4-by-4) where p may or may not be equal to q

• (b) the pq-by-pq (16-by-16) non-row-normalized (non-stochastic) neighborhood matrix C with 4 nearest neighbors, and

• (c) the row-normalized version i.e. W which is also pq-by-pq (16-by-16). The product pq is equal to n (16), i.e. the problem size.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

0100100000000000101001000000000001010010000000000010000100000000100001001000000001001010010000000010010100100000000100100001000000001000010010000000010010100100000000100101001000000001001000010000000010000100000000000100101000000000001001010000000000010010

021002

1000000000003

1031003

10000000000

03103

10031000000000

002100002

1000000003

1000031003

10000000

041004

1041004

1000000

0041004

1041004

100000

00031003

10000310000

00003100003

10031000

0000041004

1041004

100

00000041004

1041004

10

000000031003

1000031

000000002100002

100

00000000031003

10310

000000000031003

1031

0000000000021002

10

6th row6th row


1.1 Neighborhood Structures

2-D16 Neighborhood

2-D8 Neighborhood

1-D2 Neighborhood

2-D4 Neighborhood

2-D24 Neighborhood


1.2 Symmetrize W• Input: p,q,n,W,C,D_onehalf

• Output: the 16-by-16 symmetric eigenvalue-eqiuvalent matrix of W

• Matlab programs do not need this step since eig function can find the eigenvalues of a dense non-symmetric matrix

• EISPACK’s subroutines find all eigenvalues of a symmetric dense matrix. Therefore, need to convert W to its eigenvalue-equivalent form

• The following short-cut algorithm achieves this task:

s.eigenvalue of in terms of equivalent symmetric ** **~

) of version )stochastic(or normalized-row iswhich *= )

if 0= and as defined is )

matrix odneighborhobinary The

1/2-2/11/2-2/1

1

WD C DD WDWC CDW

D

C

dc

jidcdb

a)

ijn

1iijii

)(*)(*)()( ~

41, 3

~

// 1, 2 )(matrix_4Neighbors-form_2D 1

//

//

/

jii,ji,jni

nj

2121

2121

21

DDCWfor

CDDWforWWD

W~


1.3 Form y & x Vectors

• Input: p,q,n,,W

• Output: y and x vectors each of which is 16-by-1

• They are the synthetic (training) data to test SAR model

• Fortran programs perform this task separately in another program using NAG SMP library subroutines

• For n=16:• xT= [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15,16] • yT=[ 8.2439 7.9720 11.7226 16.4201 17.9623

19.5719 22.8666 24.9425 29.9876 30.6348 35.1528 37.4712 42.1301 42.7142 45.7171 47.8384]


2. Find All of the Eigenvalues of W

• Matlab programs use eig function which finds all of the eigenvalues of non-symmetric dense matrix

• Fortran programs use the tred2 and tql1 EISPACK subroutines, which is the most efficient to find eigenvalues

• There are two sub-steps:– Convert dense symmetric matrix to tridiagonal

matrix– Find all eigenvalues of the tridiagonal matrix


2.1 Convert symmetric matrix to tridiagonal matrix• Input: n,

• Output: Diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector d, the sub-diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector e

• This is Householder Transformation which is only used in fortran programs

• This step is the most-time consuming part (%99 of the total execution time)

W~


2.2 Find All Eigenvalues of the Tridiagonal Matrix• Input: Diagonal elements of the resulting tri-diagonal matrix

in 16-by-1 column vector d, the sub-diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector e

• Output: All of the eigenvalues of the neighborhood matrix W

• This is QL transformation which is only used in fortran programs

• This is left serial in fortran programs


3. Fit for the SAR Parameter rho ()

• There are two subroutines in this step– Calculating constant statistics terms

• Saves time by calculating the non-changing terms during the golden section search algorithm

– Golden section search• Similar to binary search


3.1 Calculating constant statistics terms• Input: x,y,n,W• Output: KY, KWY column vectors

• Fortran programs use this subroutine to perform K-optimization where constant term K=[I-x((xTx)-1)xT] which is 16-by-16 for n=16– The second term in log-likelihood expression= yT (I-rho W)T [I-x((xTx)-1)xT]T [I-x*((xTx)-1)*xT] (I-rho W) y

= yT (I-rho*W)'* KT K (I-rho W) y

= (K (I-rho W) y)T * (K (I-rho W) y)

= (Ky - rho KWy )T * (Ky - rho KWy ) which saves many matrix-vector multiplications

• Matlab programs directly calculate all (constant and non-constant) terms in the log-likelihood function over and over again so do not need this operation– Those terms are not expensive to calculate in Matlab but

expensive in fortran


3.2 Golden Section Search

• Input: x,y,n,eigenn,W,KY,KWY,ax,bx,cx,tol

• Output: fgld, xmin, niter

• The best estimate for rho (xmin) is found• The optimization function is the log-likelihood

function• Similar to binary search and bisection method• The detailed outputs are given in the execution

traces


4. Calculate Beta () and Sigma ()

• Input: x,y,n,niter,rho_cap,W,KY,KWY • Output: beta_cap and sigmasqr_cap which are both scalars

• Niter (number of iterations), rho_cap are scalars• Calculating best estimate of beta and sigma2

• The formulas are:

• As n (I.e. problem size) incerases, the estimates become more accurate

nTTTT

TT

/)]()([)(ˆ )()(ˆ

12

1

yWIxxxxIWIyyWIxxx


Sample Output from Matlab Programs (n=16)• Comparison of methods:

– First row: Dense straight log-det calculation – Second row: Log-det calculation via exact eigenvalue

calculation – Third row: Log-det calculation via approximate

eigenvalue calculation (Griffith) – Fourth row: Log-det calculation via better approximate

eigenvalue calculation (Griffith) – Fifth row is coming up as Markov Chain Monte Carlo

approximated SAR• Columns are defined as follows: actual rho=0.412 &

beta=1.91 ML value min_log rho_cap niter beta_cap

sigma_sqr 2.6687 -1.0000 0.4014 77.0000 1.9465 0.85092.6687 -1.0000 0.4014 77.0000 1.9465 0.8509 2.6645 -1.0000 0.4018 77.0000 1.9452 0.8508

2.6641 -1.0000 0.4019 77.0000 1.9451 0.8508


Sample Output from Matlab Programs (n=2500)• Comparison of methods:

– First row: Dense straight log-det calculation – Second row: Log-det calculation via exact eigenvalue

calculation – Third row: Log-det calculation via approximate eigenvalue

calculation (Griffith) – Fourth row: Log-det calculation via better approximate

eigenvalue calculation (Griffith) – Fifth row is coming up as Markov Chain Monte Carlo

approximated SAR• Columns are defined as follows: actual rho=0.412 &

beta=1.91 ML value min_log rho_cap niter beta_cap sigma_sqr

7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985 7.8683 -1.0000 0.4127 77.0000 1.9079 0.9985 7.8681 -1.0000 0.4127 77.0000 1.9079 0.9985


Instructions on How-To-Run Matlab Code• On SGI Origins

– train_test_SAR (interactively)– matlab -nodisplay < train_test_SAR.m > output

• On IBM SP– same

• On IBM Regatta– same

• On Cray X1– Not available yet


Instructions on How-To-Run Fortran Code

• On SGI Origins with 4 threads (processors)– f77 -64 -Ofast -mp sar_exactlogdeteigvals_2DN4_2500.f– setenv OMP_NUM_THREADS 4 – time a.out

• On IBM SP with 4 threads (processors)– xlf_r -O3 -qstrict -q64 -qsmp=omp sar_exactlogdeteigvals_2DN4_2500.f– setenv OMP_NUM_THREADS 4 – time a.out

• On IBM Regatta with 4 threads (processors)– xlf_r -O3 -qstrict -q64 -qsmp=omp sar_exactlogdeteigvals_2DN4_2500.f– setenv OMP_NUM_THREADS 4 – time a.out

• On Cray X1– Coming soon…


Illustration on a 2-by-2 Regular Grid Space

0110100110010110

•Aim is to show how SAR works, i.e. Execution Trace

•Forming training data, y •With known rho (0.1), beta (1.0), x ([1;2;3;4]), epsilon(=0.01*rand(4,1)), and

the neighborhood matrix W for 1-D, compute y (observed variable)(Matlab notation is used here)

y=inv(eye(4,4)-rho.*W)*(beta.*x+epsilon) Testing SAR

•Solving SAR model = Finding eigenvalues & Fitting for SAR parameters•Run SAR model with inputs as y, epsilon, W and with a range of rho [0,1)•Find beta as well•The prediction for rho is very close to 0.1 (with an error of %0.01 !)

05.05.005.0005.05.0005.0

05.05.00

Binary form Row-normalized (Stochastic) form

NeighborhoodMatrix

1 2

3 4

Landscape


Summary of the SAR Software i.e. The Big Picture

sigmasqrbeta estimate_Subroutine

fwn Subroutinerchection Seae Golden S Subroutin

statermsSubroutine

tql1Subroutine tred2Subroutine

formSubroutineation // symmetrizSubroutine

form_2D_Subroutine form_1D_Subroutine

_ 4.1 1) Figurein CBox (i.e. ˆ estimate thegCalculatin 4.

function )likelihood-(log likelihood maximum theof log thei.e. function, optimized The// 3.2.1 onoptimizatilinear -non The // 3.2

function likelihood-log of ermsconstant t calculate // to 3.1 1) Figurein BBox (i.e. ˆparameter SARfor Fitting 3.

matrix al tridiagon theof seigenvalue thefind EISPACK to of // 2.2 formmatrix al tridiagon to edstandardiz-row of equivalent symmetricconvert EISPACK to of // 2.1

1) Figurein A Box (i.e. of sEigenvalue Finding 2. vector variable)observed (i.e.dataset training theform // to_ 1.3

edstandardiz-row of equivalent-eigenvalue symmetric theform to 1.2 case ldimensiona-for two edstandardiz-row of equivalent symmetric form // to 1.1b case ldimensiona-onefor edstandardiz-row of equivalent symmetric form // to 1.1a

step processing-Pre 1. SS REGRESSION-OPATIAL_AUTERIAL

WW

y yW

WWWW


New Parallel SAR Model Solution Formulation

• Prof. Mark Bull’s “Expert Programmer vs Parallelizing Compiler” Scientific Programming 1996 paper

• The loop 240 & loop 280 are the major bottlenecks and parallelized most of the code as will be shown

• The data distribution on both loops should be similar to benefit from value re-use

• Loop 280 cannot benefit from block-wise partitioning, it should use interleaved scheduling for load balance. Thus, both loops use interleaved scheduling

• Parallelizing initialization phase imitates manual data distribution, page-placement & page-migration utilities of SGI Origin machines

• The variable “etemp” enables reduction operation on the variable “e” that is updated by different processors


Experimental Design – Response Variables & Factors– Speedup: S =Tserial / Tparallel ,

– Time taken to solve a problem on a single processor over time to solve the same problem on a parallel computer with p identical processors.

– Scalability of a Parallel System: – The measure of the algorithm’s capacity to increase S in proportion to

p in a particular parallel system.– Scalable systems has the ability to maintain efficiency (i.e. S / p) at a

fixed value by simultaneously increasing p and PS.– Reflects a parallel system’s ability to utilize increasing processing

resources effectively.

Name of Factor Factor’s Parameter Domain•Problem Size (n) {400, 2500, 6400,10000}•Neighborhood Structure {2-D 4-Neighbors}•Range of rho {[0,1)}•Parallelization options {Static scheduling for box B and C,

dynamic with chunk size 4 for box A}•Number of processors {1,…,16}•Algorithm used {householder transformation followed by QL

algorithm followed by golden section search}


Related Work• Eigenvalue software on the web are studied in depth at the following

web sites:– http://www-users.cs.umn.edu/~kazar/sar/sar_presentations/eigenvalue_solvers.html– http://www.netlib.org/utk/people/JackDongarra/la-sw.html

• [Griffith,1995] computed analytically the eigenvalues for 1-D, 2-D with {4,8} neighbor cases. However, this is tedious for other cases– The approximate and better approximate eigenvalues are from

this source– However, there are no closed form expression for many other

cases

• Bin Li [1996] implemented parallel SAR using a constant mean model

– The programs cannot run anymor (CM-fortran)– Our model is more general and hard to solve– Our programs are the only running algorithms in the

literature

http://www-users.cs.umn.edu/~kazar/sar/sar_presentations/eigenvalue_solvers.html

http://www.netlib.org/utk/people/JackDongarra/la-sw.html


Ref# M ethod Name All Data Sub Data All Data Sub Data All Data Sub Data All Data Sub Data

P ol l ack 1992 Double Bounded Estimates (DBLE) Approach

M ar tin 1993 Taylor Series Approximation Approach

Gr iffi th 1995 Eigenvalue Approximation Approach

Li 1996 Regular Square Tessellation 1-D Planar Surface Partitioning

P ace 1997 Applying Direct Sparse M atrix Algorithms

B ar r y 1999 M onte Carlo Estimates M ethod

LeSage 2000 M atrix Exponential Specification Approach

LeSage 2000 Bayesian M atrix Exponential Specification Approach

P ace 2000 Graph Theory Approach

Smi r nov 2001 Characteristic Polynomial Approach

P ace 2001 Semiparametric Estimates

P ace 2002 Chebyshev Approximation M ethod

P ace 2003 SALE M ethod

P ace Soon Double Bounded Likelihood Estimator

P ace Soon Approach of Derivation of Upper and Lower Bounds

M aximum Likelihood Bayesian

Direct EstimateEstimateDirect

SAR Model Solutions are cross-classified


References

• [ICP03] Baris Kazar, Shashi Shekhar, and David J. Lilja, "Parallel Formulation of Spatial Auto-Regression", submitted to ICPP 2003 [under review]

• [IEE02] S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla, Spatial Contextual Classification and Prediction Models for Mining Geospatial Data , IEEE Transactions on Multimedia (special issue on Multimedia Dataabses) , 2002


Conclusions & Future Work

• Able to solve SAR model for any type of W matrix until the memory limit is reached by finding all of its eigenvalues

• Finding eigenvalues is hard

• Sparsity of W should be exploited if eigenvalue subroutines allow

• Efficient Partitioning of very large sparse matrices via parMETIS

• New methods will be studied:– Markov Chain Monte Carlo Estimates (parSARmcmc) – parSALE = parallel spatial auto-regression local estimation – Characteristic Polynomial Approach

• Contributions to spatial statistics package Version 2.0 from Prof. Kelley Pace will continue


Short Tutorial on OpenMP• Fork-join model of parallel execution

• The parallel regions are denoted by directives in fortran and pragmas in C/C++• Data environment: (first/last/thread) Private, shared variables and their scopes

across parallel and serial regions• Work-sharing constructs: parallel do, sections (Static, dynamic scheduling

with/without chunks)• Synchronization: atomic, critical, barrier, flush, ordered, implicit

synchronization after each parallel for loop • Run-time library functions e.g. to determine which thread is executing at some

time

Serial Region

Master

Thread

ChildThreads

Master

Thread

ChildThreads

Parallel Region

Parallel Region

Serial Region


Short Tutorial on Eigenvalues

• Let A be a linear transformation represented by a matrix. If there is a X different from zero vector such that:

for some scalar λ, then λ is called the eigenvalue of A with corresponding (right) eigenvector X.

• Eigenvalues are also known as characteristic roots, proper values, or latent roots (Marcus and Minc 1988, p. 144).

• (A-I λ)X=0 is the characteristic equation (polynomial) and roots of this polynomial are the eigenvalues.

XAX

n) x (n nbyn is AAn

ii

where)det(1


Hands-On Part

• Please goto:– http://www.cs.umn.edu/~kazar/sar/index.html– Find 05.14.2003 phrase– Type in “shashi” for username– Type in “shashi” for password

• To run the programs we need to login to one of the SGI Origins, IBM SP, IBM Regatta (Cray X1 is not ready yet)

• All programs are run by submitting to a queue• No interactive runs recommended for fortran

programs due to the system load and high number of processors needed for execution

http://www.cs.umn.edu/~kazar/sar/index.html

ahpcrc spatial data-mining tutorial on scalable parallel formulations of spatial auto-regression...

Documents

spatial error model

spatial autoregression

spatial lag model

spatial autocorrelation

n matrix of spatial

sar model contd

sar model collapses

parallel sar models