guest lecture on n-body problems george biros the ... · the nearest neighbor problem •...

Post on 01-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scientific Computing II

Guest Lecture on N-body problems George Biros

The University of Texas at Austin

10 important algorithms in 20th century (SIAM)

1946 – The Monte Carlo method 1947 – The simplex method 1950 – Krylov iterative method 1951 – Matrix factorizations 1957 – FORTRAN 1959 – QR and eigenvalue computations 1962 – QuickSort algorithm 1965 – The fast Fourier transform 1977 – Integer relation detection 1987 – The fast multipole method

N-body problem

The nearest neighbor problem

• ⇢-Nearest neighbor:Given q 2 Rd

, find all ri such that d(ri, q) < ⇢

• k-Nearest neighbor:Given q 2 Rd

, find its k closest ri

Input: {ri}ni=1 2 Rd

Let d(ri, rj) := kri � rjk2q

q r

Millennium simulation – Max Planck Garching

The numbers

Consider a sequential machine (your laptop) with infinite memory and 1GHz clock ■ 1 billion flops/sec (1 GFlop/s) ■ Assume two-body interactions are 1 flop ■ Time = number_of_particles2 / 1 billion

Particles 1K 1M 1B 100B

Time 0.001s 1000s 32 321

This is time per single force interaction. Typically you need several interactions.

App – gravitational interactions

App – protein electrostatics

Chandra Bajaj, UT Austin

App-Complex fluids

Jaguar—MoBo—Weak Scaling (260M RBCs, 90B unknowns)

0.7 PFlop/s

App - Electromagnetic scattering (time-harmonic Maxwell)

App – Heat transfer / potential flow

App - Computer graphics/approximation theory

Beatson et al, SIGGRAPH’01

App – Classification / Image segmentation

Goya, B. et al, 2011

App - Supervised learning & kernel machines

March, Yu, Xiao, Tharakan , & B, ‘15

Binary classification Image segmentation

App - Bayesian priors for inverse problems

L2 H1

CH KDE+TV Xiao & B. ‘14

App - Variable-coefficient PDEs

Formulation (Lippmann-Schwinger)

u(x)��u(x) + ⌘(x)u(x) = f(x)

becomes

u(x) +

Z

yG(x� y)⌘(y)u(y) = f(x)

��u(x)� div (⌘(x)ru(x)) = f(x)

becomes

u(x)�Z

yrG(x� y) · ⌘(y)ru(y) =

Z

yG(x� y)f(y)

GMRES (PETSc) with volume FMM for the matrix-vector multiplication

App - Porous medium flow, 18B dof, jump=1E9

Malhotra, Gholami, B., SC’14

p: Stampede nodes node: 1.42TFLOP Xeon: 16 core Xeon: Phi ~ 22% peak

Algorithm

N-body problem

Idea I: Approximate Far-field interactions

sources targets

separation

x xi

xw

Idea II: Near-/Far-field decomposition

Idea III: hierarchical / multilevel recursion

Matrix partitioning

Rd Matrix partitioning

Regular decompositions

Low dimensions High dimensions

Well separated boxes (Near- / Far-field decomp)

Source (S) and Targets (T) The effect of S on any T not in the gray area can be approximated

S T

t

T

T

Single level

Single level – averaging (preprocessing)

Single level – evaluation at single target point x

x

Single level – evaluation, direct neighbors

x

Single level – evaluation, direct self

x

Single level – evaluation, far field

x

Questions

•  How do we create more sophisticated far-field representations?

•  How to obtain an optimal complexity algorithm? •  N2/block size à N log N à N

•  How do we control accuracy? •  How do we lower the constants? •  How to we parallelize? •  How to we use accelerators?

Hierarchical / multilevel

Upward pass

Upward pass

Upward pass

Upward pass

Evaluate

For each leaf box Top down traverse tree If tree node is far

enough, use approximation

“Far enough”:

■  Target box is not a descendant of a neighbor of tree node being visited

Evaluate

Evaluate

Direct interactions

Trees

Grouping points

Grouping depends on distribution of points x=randn(1024,2).^3; plot(x(:,1),x(:,2),'.') x=randn(32000,2).^3; plot(x(:,1),x(:,2),'.')

Adaptive tree

Treecodes - evaluation

Target Point

Tree construction (quadtree in 2D)

Average quadtree

Well separated boxes

Source (S) and Targets (T) The effect of S on any T not in the gray area can be approximated

S T

t

T

T

Evaluate

Complexity

Building the tree: O(N log N) Averaging: O(N) Evaluation: O(N logN) tree depth : logN for every point log(N) calculation

References

Further reading / references: •  http://www.nature.com/nature/journal/v324/n6096/abs/

324446a0.html -- Barnes-Hut algorithm •  http://math.nyu.edu/faculty/greengar/shortcourse_fmm.pdf •  doi:10.1016/j.jcp.2003.11.021 – kernel independent far field •  doi:10.1137/140989546 – general dimension N-body •  doi:10.1145/2160718.2160740 – overview on parallelism •  arXiv:1409.2802 – review on far field approximation and

hierarchical matrices

Extra material

Accurate source averaging – far field approximations

Instead of using just average density and center of mass, use several “equivalent” sources.

The original sources are the ‘+’ and ‘-’ in the figure

The equivalent sources placed at the red locations, equispaced on an enclosing circle. The densities are computed by

Least squares problem

Node averaging

The same idea can be used for averaging the equivalent densities of the children of a node.

Single-node

Malhotra, Gholami, B., SC’14 Best Student Paper Finalist

Algorithmic redesign; data structure redesign, blocking and loop reordering, OpenMP + AVX + SSE; prefetching; task parallelism, asynchronous offload

Hardware + algorithms

Malhotra, Gholami, B., SC’14 Best Student Paper Finalist

Fast N-body solver milestones (up to 2000)

Vladimir Rokhlin, Yale Leslie Greengard, Courant Institute

Sequential Appel, 1985 - O(N logN) Barnes & Hut, 1986 - O(N logN) tree code Rokhlin, 1985 - Multipole translations Greengard & Rokhlin, 1987 - FMM O(N) Carrier, Greengard, & Rokhlin - Adaptive FMM O(N) Anderson, 1992 - Equivalent densities Cheng, Greengard & Rokhlin, 1999 - Efficient 3D FMM Hackbush,et al,2003 - Hierachical matrices

Parallel Greengard & Gropp, 1991 - Uniform distributions Warren & Salmon, 1992 - Distributed treecode SH Teng, 1998 - Theory Different Operators Greengard & Strain, 1991 – Gauss transform Rohklin, 1992 – High frequency Helmholtz Michielssen, 1998 – Time-domain Wave, Maxwell Kapur & Long, 1997 – Algebraic view

N-body methods: Scalability requirements

Computational physics/chemistry/biology •  large scale – 1E13 unknowns (3D/4D) Graphics/signal analysis •  real time – 1E6 unknowns (1D/4D) Data analysis/Uncertainty quantification •  large scale – 1E15 unknowns (1D - 1000D) •  real time / streaming / online

A few things left out

O(N) Fast multipole methods Parallel implementation Analytic expansions for far-field approximation

(Taylor, Laurent, Spherical Harmonics) Rigorous complexity analysis Oscillatory kernels Periodic and other boundary conditions Ewald sums Gaussian kernels and more general kernels

top related