Download - Optimization and Parallelization of FIND Algorithm

Page 1: Optimization and Parallelization of FIND Algorithm

BackgroundSerial FIND (Fast Inverse using Nested Dissection)

Simulation ResultsParallel Methods

Optimization and Parallelization of FINDAlgorithm

Song Li Eric Darve

Institute for Computational and Mathematical Engineering, Stanford [email protected]

SIAM CSE09March 4, 2009

Song Li, Eric Darve Optimization and Parallelization of FIND Algorithm

Page 2: Optimization and Parallelization of FIND Algorithm

Outline

1 Background

2 Serial FIND (Fast Inverse using Nested Dissection)

3 Simulation Results

4 Parallel Methods

Page 3: Optimization and Parallelization of FIND Algorithm

Outline

1 Background

4 Parallel Methods

Page 4: Optimization and Parallelization of FIND Algorithm

Introduction

Modeling the current throughnano-devices by Non-EquilibriumGreen’s Function approachSystem of Schrödinger-PoissonequationsBest known algorithm (RGF) hasrunning time O(n3

xny )

Our method (FIND): O(n2xny )

Other devices: nanotubes andnanowires

Page 5: Optimization and Parallelization of FIND Algorithm

The Math Problem

What we want: thediagonal of Gr = A−1

What we have: a sparsematrix A from adiscretized 2D mesh

Page 6: Optimization and Parallelization of FIND Algorithm

The Math Problem

4× 5 mesh

ny = 5

nx = 4

Page 7: Optimization and Parallelization of FIND Algorithm

The Math Problem

20× 20 matrix A4× 5 mesh

ny = 5

nx = 4

Page 8: Optimization and Parallelization of FIND Algorithm

The Math Problem

ny = 5

nx = 4

Page 9: Optimization and Parallelization of FIND Algorithm

The Math Problem

ny = 5

nx = 4

Page 10: Optimization and Parallelization of FIND Algorithm

Outline

1 Background

4 Parallel Methods

Page 11: Optimization and Parallelization of FIND Algorithm

Key Observations

Last entry in A−1 can be obtained through LU factorization:(A−1)nn = (U−1)nn = (Unn)−1

Obtain all the diagonals through multiple factorizationsLocal connectivity⇒ problem decomposition: partialfactorizations feasibleProper ordering makes most of them identical:subproblems overlap⇒ dynamic programmingComputational cost for all the diagonal entries of theinverse is of the same order as a single LU factorization!

Page 12: Optimization and Parallelization of FIND Algorithm

Key Observations

Page 13: Optimization and Parallelization of FIND Algorithm

Key Observations

Page 14: Optimization and Parallelization of FIND Algorithm

Key Observations

Page 15: Optimization and Parallelization of FIND Algorithm

Key Observations

Page 16: Optimization and Parallelization of FIND Algorithm

Overall Structure: Partition Tree

Order the mesh nodesin a way similar tonested dissection

Partition the wholemesh and form a treestructure to exploit thesubproblem overlap

Page 17: Optimization and Parallelization of FIND Algorithm

One Step of Elimination

Gaussian elimination: A∗( t, t) def= A( t, t)− A( t, t)A( t, t)−1A( t, t)

A( t, t) A( t, t) 0A( t, t) A( t, t) A( t, t)

0 A( t, t) A( t, t) elimination

=⇒

A( t, t) A( t, t) 00 A∗( t, t) A( t, t)0 A( t, t) A( t, t)

eliminated node

inner node

bounary node

outer node⇒

Page 18: Optimization and Parallelization of FIND Algorithm

Two Full Elimination Processes

Keep partitioning the mesh to get small clustersStore results of each partial eliminationThe partial results could be reused

eliminated node

inner node

bounary node

outer node

target node

Page 19: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 20: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 21: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 22: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 23: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 24: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 25: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 26: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 27: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 28: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 29: Optimization and Parallelization of FIND Algorithm

eliminated node

inner node

bounary node

outer node

target node

Page 30: Optimization and Parallelization of FIND Algorithm

Extensions and Optimizations

G< = A−1ΣA−† has similar sparsity patternso our method is applicable as wellAlso for computing off-diagonal entriesExtra sparsity

rewrite the one step elimination:A∗( t, t) def

= A( t, t)− A( t, t)A( t, t)−1A( t, t)these blocks are themselves sparse

Exploit to optimize!The elimination preserves symmetry andthis further reduces cost