simulation informatics; analyzing large scientific datasets

Simulation Informatics!Analyzing Large Datasets from Scientific Simulations

DAVID F. GLEICH !PURDUE UNIVERSITY

COMPUTER SCIENCE !DEPARTMENT

PAUL G. CONSTANTINE !STANFORD UNIVERSITY

JOE RUTHRUFF !& JEREMY TEMPLETON !SANDIA NATIONAL LABS

CS&E Seminar David Gleich · Purdue 1

This talk is a story …


How I learned to stop worrying and love the simulation!


I asked …!Can we do UQ on PageRank?


PageRank by Google

1

2

3

4

5

6

The Model1. follow edges uniformly with

probability �, and2. randomly jump with probability

1� �, we’ll assume everywhere isequally likely

The places we find thesurfer most often are im-portant pages.

David F. Gleich (Sandia) PageRank intro Purdue 5 / 36

PageRank by Google

1

2

3

4

5

6

The Model1. follow edges uniformly with

probability �, and2. randomly jump with probability

1� �, we’ll assume everywhere isequally likely

The places we find thesurfer most often are im-portant pages.

David F. Gleich (Sandia) PageRank intro Purdue 5 / 36

Google’s PageRank


Random alpha PageRankRAPr�

Model PageRank as the random variables

x(A)

and look atE [x(A)] and Std [x(A)] .

Note � “Wrapper” not “rapper”Gleich and Constantine, Workshop on Algorithms on the Web Graph, 2007; Gleich and Constantine, submitted.David F. Gleich (Sandia) Random sensitivity Purdue 16 / 36



x(A)





x(A)





x(A)



Explored in Constantine and Gleich, WAW2007; and "Constantine and Gleich, J. Internet Mathematics 2011.

Random alpha PageRank or PageRank meets UQ

Which sensitivity?

(�� P)x = (1� �)v

Sensitivity to the links : examined and understood

Sensitivity to the jump : examined, understood, and useful

Sensitivity to � : less well understood

David F. Gleich (Sandia) Sensitivity Purdue 10 / 36


Convergence theory

Method Conv. Work Required What is N?

Monte Carlo 1pN

N PageRank systems number ofsamples from A

Path Damping(withoutStd [x(A)])

rN+2

N1+�N+ 1 matrix vectorproducts

terms ofNeumann series

GaussianQuadrature r2N N PageRank systems

number ofquadraturepoints

� and r are parameters from Bet�(�, b, �, r)

David F. Gleich (Sandia) Random sensitivity Purdue 27 / 36

Random alpha PageRank has a rigorous convergence theory.


Working with PageRank showed us how to treat UQ more generally …


Constantine, Gleich, and Iaccarino. Spectral Methods for Parameterized Matrix Equations, SIMAX, 2010.

Constantine, Gleich, and Iaccarino. A factorization of the spectral Galerkin system for parameterized matrix equations: derivation and applications, SISC 2011.

A(s)x(s) = b(s), A(J1)x(J1) = b(J1)) A(JN )x(JN ) = b(JN ) or) AN (J1)xN (J1) = bN (J1)

How to compute the Galerkin solution in a weakly intrusive manner.!

A(s)x(s) = b(s)

We studied parameterized matrices.

Discretized PDE with explicit parameters

Parameterized Solution


Simulation!The Third Pillar of Science 21st Century Science in a nutshell!

Experiments are not practical or feasible. Simulate things instead.

But do we trust the simulations?! We’re trying!

Model Fidelity Verification & Validation (V&V) Uncertainty Quantification (UQ)


The message Insight and confidence requires multiple runs.


The problem A simulation run ain’t cheap!


Another problem It’s very hard to “modify” current codes.


Large scale nonlinear, time dependent heat transfer problem

105 nodes 103 time steps 30 minutes on 16 cores

Questions What is the probability of failure? Which input values cause failure?


It’s time to ask "What can science learn from Google?""- Wired Magazine (2008)


21.1st Century Science �in a nutshell?

Simulations are "too expensive. Let data provide a surrogate.

We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. - Wired (again)

CS&E Seminar David Gleich · Purdue 16/1

8

Our approach!Construct an interpolating reduced order model from a budget-constrained ensemble of runs for uncertainty and optimization studies.


That is, we store the runs Supercomputer Data computing cluster Engineer

Each multi-day HPC simulation generates gigabytes of data.

A data cluster can hold hundreds or thousands of old simulations …

… enabling engineers to query and analyze months of simulation data for statistical studies and uncertainty quantification.

and build the interpolant from the pre-computed data.


Input "Parameters

Time history"of simulation

s f

The Database

s1 -> f1 s2 -> f2

sk -> fk

f(s) =

2

66666666666664

q(x1, t1, s)...

q(xn

, t1, s)q(x1, t2, s)

...q(x

n

, t2, s)...

q(xn

, t

k

, s)

3

77777777777775

A single simulation at one time step

X =⇥f(s1) f(s2) ... f(sp)

⇤

The database as a matrix

The

simula

tion

as a

vec

tor


The interpolant

Motivation!Let the data give you the basis. Then find the right combination

X =⇥f(s1) f(s2) ... f(sp)

⇤

f(s) ⇡rX

j=1

uj↵j (s)

This idea was inspired by the success of other reduced order models like POD; and Paul’s residual minimizing idea.

These are the left singular vectors from X!


Why the SVD?!Let’s study a simple case.

X =

2

66664

g(x1, s1) g(x1, s2) · · · g(x1, s

p

)

g(x2, s1). . .

. . ....

.... . .

. . .g(x

m�1, s

p

)g(x

m

, s1) · · · g(xm

, s

p�1) g(xm

, s

p

).

3

77775

= U⌃VT ,

g(xi

, s

j

) =rX

`=1

U

i ,`�`Vj ,` =

rX

`=1

u`(xi

)�`v`(sj

)

treat each right singular vector as samples of the unknown basis functions

split x and s

g(xi

, s) =rX

`=1

u`(xi

)�`v`(s) v`(s) ⇡pX

j=1

v`(sj

)�(`)j

(s)

Interpolate v any way you wish

a general parameter


Method summary

Compute SVD of X!Compute interpolant of right singular vectors Approximate a new value of f(s)!


A B

A quiz!Which section would you rather try and interpolate, A or B?


How predictable is a !singular vector? Folk Theorem (O’Leary 2011) The singular vectors of a matrix of “smooth” data become more oscillatory as the index increases. Implication!The gradient of the singular vectors increases as the index increases.

v1(s), v2(s), ... , vt (s) vt+1(s), ... , vr (s)Predictable Unpredictable


A refined method with !an error model

Predictable Unpredictable ⌘j ⇠ N(0, 1)

Variance[f] = diag

0

@rX

j=t(s)+1

�jujuTj

1

A

Don’t even try to interpolate the predictable modes.

f(s) ⇡t(s)X

j=1

uj↵j (s) +rX

j=t(s)+1

uj�j⌘j

But now, how to choose t(s)? CS&E Seminar David Gleich · Purdue 25

Our current approach to choosing the predictability

1�1

⌧X

i=1

�i

��@v

i

@s

�� < threshold

t(s) is the largest 𝜏 such that


An experimental test case

A heat equation problem Two parameters that control the material properties


Experiments

CS&E Seminar David Gleich · Purdue

20 point, Latin hypercube sample

28

Our Reduced Order Model

The

Trut

h

Where the error is the worst


A Large Scale Example

Nonlinear heat transfer model 80k nodes, 300 time-steps 104 basis runs SVD of 24m x 104 data matrix 500x reduction in wall clock time (100x including the SVD)


Tall-and-skinny QR (and SVD)!on MapReduce

PART 2 !


Quick review of QR QR Factorization

David Gleich (Sandia)

Using QR for regression

  is given by the solution of  

QR is block normalization“normalize” a vector usually generalizes to computing   in the QR

A Q

Let   , real

 

  is   orthogonal (   )

  is   upper triangular.

0

R

=

4/22MapReduce 2011 CS&E Seminar David Gleich · Purdue 32

Intro to MapReduce Originated at Google for indexing web pages and computing PageRank.

The idea Bring the computations to the data.

Express algorithms in "data-local operations. Implement one type of communication: shuffle. Shuffle moves all data with the same key to the same reducer.

MM R

RMM

Input stored in triplicate

Map output"persisted to disk"before shuffle

Reduce input/"output on disk

1 MM R

RMMM

Maps Reduce

Shuffle

2

3

4

5

1 2 M M

3 4 M M

5 M

Data scalable

Fault-tolerance by design


Mesh point variance in MapReduce Run 1 Run 2 Run 3

T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3


Mesh point variance in MapReduce

M M M

R R

1. Each mapper out-puts the mesh points with the same key.

2. Shuffle moves all values from the same mesh point to the same reducer.

Run 1 Run 2 Run 3

3. Reducers just compute a numerical variance.

Bring the computations to the data!

T=1 T=2 T=3 T=1 T=2 T=3 T=1 T=2 T=3


Communication avoiding QR (Demmel et al. 2008) Communication avoiding TSQR

Demmel et al. 2008. Communicating avoiding parallel and sequential QR.

First, do QR factorizationsof each local matrix  

Second, compute a QR factorization of the new “R”

David Gleich (Sandia) 6/22MapReduce 2011CS&E Seminar David Gleich · Purdue 36

Serial QR factorizations!(Demmel et al. 2008) Fully serial TSQR

Demmel et al. 2008. Communicating avoiding parallel and sequential QR.

Compute QR of   , read   , update QR, …

David Gleich (Sandia) 8/22MapReduce 2011


Tall-and-skinny matrix storage in MapReduce MapReduce matrix storage

 

Key is an arbitrary row-idValue is the   array for

a row.

Each submatrix   is an input split.

A1

A2

A3

A4



A1

A2

A3

A1

A2qr

Q2 R2

A3qr

Q3 R3

A4qr Q4A4

R4

emit

A5

A6

A7

A5

A6qr

Q6 R6

A7qr

Q7 R7

A8qr Q8A8

R8

emit

Mapper 1Serial TSQR

R4

R8

Mapper 2Serial TSQR

R4

R8

qr Q emitRReducer 1Serial TSQR

AlgorithmData Rows of a matrix

Map QR factorization of rowsReduce QR factorization of rows


Key Limitations

Computes only R and not Q Can get Q via Q = AR+ with another MR iteration. " (we currently use this for computing the SVD) Dubious numerical stability; iterative refinement helps. Working on better ways to compute Q "(with Austin Benson, Jim Demmel)


Full code in hadoopy In hadoopyimport random, numpy, hadoopyclass SerialTSQR:def __init__(self,blocksize,isreducer):self.bsize=blocksizeself.data = []if isreducer: self.__call__ = self.reducerelse: self.__call__ = self.mapper

def compress(self):R = numpy.linalg.qr(

numpy.array(self.data),'r')# reset data and re-initialize to Rself.data = []for row in R:self.data.append([float(v) for v in row])

def collect(self,key,value):self.data.append(value)if len(self.data)>self.bsize*len(self.data[0]):self.compress()

def close(self):self.compress()for row in self.data:key = random.randint(0,2000000000)yield key, row

def mapper(self,key,value):self.collect(key,value)

def reducer(self,key,values):for value in values: self.mapper(key,value)

if __name__=='__main__':mapper = SerialTSQR(blocksize=3,isreducer=False)reducer = SerialTSQR(blocksize=3,isreducer=True)hadoopy.run(mapper, reducer)

David Gleich (Sandia) 13/22MapReduce 2011CS&E Seminar David Gleich · Purdue 41

Lots of data? Add an iteration.

S(1)

A

A1

A2

A3

A3

R1map

Mapper 1-1Serial TSQR

A2

emitR2map


A3

emitR3map


A4

emitR4map


shuffle

S1

A2

reduce

Reducer 1-1Serial TSQR

S2

R2,2reduce


R2,1emit

emit

emit

shuffle

A2S3

R2,3reduce


emit

Iteration 1 Iteration 2

identity map

A2S(2)

Rreduce


emit

Too many maps? Add an iteration!



Summary of parameters mrtsqr – summary of parametersBlocksize How many rows to

read before computing a QR factorization, expressed as a multiple of the number of columns (See paper)

Splitsize The size of each local matrix

Reduction treeThe number of reducers and iterations to use

David Gleich (Sandia)

A1

A2

A1

A2qr

Q2

A1

R1map


emit

S(2)S(1)

A

shuffle

Iteration 1

(Red)

(Red)

S(2)

(Red)

Iter 2 Iter 315/22MapReduce 2011 CS&E Seminar David Gleich · Purdue 43

Varying splitsize and the tree Varying splitsize Synthetic DataCols. Iters. Split

(MB)Maps Secs.

50 1 64 8000 388

– – 256 2000 184

– – 512 1000 149

– 2 64 8000 425

– – 256 2000 220

– – 512 1000 191

1000 1 512 1000 666

– 2 64 6000 590

– – 256 2000 432

– – 512 1000 337

Increasing split size improves performance(accounts for Hadoopdata movement)

Increasing iterations helps for problems with many columns.

(1000 columns with 64-MB split size overloaded the single reducer.)



MapReduce TSQR summary MapReduce is great for TSQR!Data A tall and skinny (TS) matrix by rows

Map QR factorization of local rowsReduce QR factorization of local rows

Input 500,000,000-by-100 matrixEach record 1-by-100 rowHDFS Size 423.3 GBTime to compute   (the norm of each column) 161 sec.Time to compute   in qr(   ) 387 sec.

On a 64-node Hadoop cluster with 4x2TB, one Core i7-920, 12GB RAM/node

Demmel et al. showed that this construction works to compute a QR factorization with minimal communication



Our vision!To enable analysts and engineers to hypothesize from "data computations instead of expensive HPC computations.

Paul G. Constantine "

Sandia!Jeremy Templeton

Joe Ruthruff

… and you ? …


simulation informatics; analyzing large scientific datasets

Technology

pagerank pagerank

random alpha pagerank

aj n xj n

data cluster

probability probability

probability of failure

atk e xa

old simulations data