guest lecture on n-body problems george biros the ... · the nearest neighbor problem •...
TRANSCRIPT
Scientific Computing II
Guest Lecture on N-body problems George Biros
The University of Texas at Austin
10 important algorithms in 20th century (SIAM)
1946 – The Monte Carlo method 1947 – The simplex method 1950 – Krylov iterative method 1951 – Matrix factorizations 1957 – FORTRAN 1959 – QR and eigenvalue computations 1962 – QuickSort algorithm 1965 – The fast Fourier transform 1977 – Integer relation detection 1987 – The fast multipole method
N-body problem
The nearest neighbor problem
• ⇢-Nearest neighbor:Given q 2 Rd
, find all ri such that d(ri, q) < ⇢
• k-Nearest neighbor:Given q 2 Rd
, find its k closest ri
Input: {ri}ni=1 2 Rd
Let d(ri, rj) := kri � rjk2q
q r
Millennium simulation – Max Planck Garching
The numbers
Consider a sequential machine (your laptop) with infinite memory and 1GHz clock ■ 1 billion flops/sec (1 GFlop/s) ■ Assume two-body interactions are 1 flop ■ Time = number_of_particles2 / 1 billion
Particles 1K 1M 1B 100B
Time 0.001s 1000s 32 321
This is time per single force interaction. Typically you need several interactions.
App – gravitational interactions
App – protein electrostatics
Chandra Bajaj, UT Austin
App-Complex fluids
Jaguar—MoBo—Weak Scaling (260M RBCs, 90B unknowns)
0.7 PFlop/s
App - Electromagnetic scattering (time-harmonic Maxwell)
App – Heat transfer / potential flow
App - Computer graphics/approximation theory
Beatson et al, SIGGRAPH’01
App – Classification / Image segmentation
Goya, B. et al, 2011
App - Supervised learning & kernel machines
March, Yu, Xiao, Tharakan , & B, ‘15
Binary classification Image segmentation
App - Bayesian priors for inverse problems
L2 H1
CH KDE+TV Xiao & B. ‘14
App - Variable-coefficient PDEs
Formulation (Lippmann-Schwinger)
u(x)��u(x) + ⌘(x)u(x) = f(x)
becomes
u(x) +
Z
yG(x� y)⌘(y)u(y) = f(x)
��u(x)� div (⌘(x)ru(x)) = f(x)
becomes
u(x)�Z
yrG(x� y) · ⌘(y)ru(y) =
Z
yG(x� y)f(y)
GMRES (PETSc) with volume FMM for the matrix-vector multiplication
App - Porous medium flow, 18B dof, jump=1E9
Malhotra, Gholami, B., SC’14
p: Stampede nodes node: 1.42TFLOP Xeon: 16 core Xeon: Phi ~ 22% peak
Algorithm
N-body problem
Idea I: Approximate Far-field interactions
sources targets
separation
x xi
xw
Idea II: Near-/Far-field decomposition
Idea III: hierarchical / multilevel recursion
Matrix partitioning
Rd Matrix partitioning
Regular decompositions
Low dimensions High dimensions
Well separated boxes (Near- / Far-field decomp)
Source (S) and Targets (T) The effect of S on any T not in the gray area can be approximated
S T
t
T
T
Single level
Single level – averaging (preprocessing)
Single level – evaluation at single target point x
x
Single level – evaluation, direct neighbors
x
Single level – evaluation, direct self
x
Single level – evaluation, far field
x
Questions
• How do we create more sophisticated far-field representations?
• How to obtain an optimal complexity algorithm? • N2/block size à N log N à N
• How do we control accuracy? • How do we lower the constants? • How to we parallelize? • How to we use accelerators?
Hierarchical / multilevel
Upward pass
Upward pass
Upward pass
Upward pass
Evaluate
For each leaf box Top down traverse tree If tree node is far
enough, use approximation
“Far enough”:
■ Target box is not a descendant of a neighbor of tree node being visited
Evaluate
Evaluate
Direct interactions
Trees
Grouping points
Grouping depends on distribution of points x=randn(1024,2).^3; plot(x(:,1),x(:,2),'.') x=randn(32000,2).^3; plot(x(:,1),x(:,2),'.')
Adaptive tree
Treecodes - evaluation
Target Point
Tree construction (quadtree in 2D)
Average quadtree
Well separated boxes
Source (S) and Targets (T) The effect of S on any T not in the gray area can be approximated
S T
t
T
T
Evaluate
Complexity
Building the tree: O(N log N) Averaging: O(N) Evaluation: O(N logN) tree depth : logN for every point log(N) calculation
References
Further reading / references: • http://www.nature.com/nature/journal/v324/n6096/abs/
324446a0.html -- Barnes-Hut algorithm • http://math.nyu.edu/faculty/greengar/shortcourse_fmm.pdf • doi:10.1016/j.jcp.2003.11.021 – kernel independent far field • doi:10.1137/140989546 – general dimension N-body • doi:10.1145/2160718.2160740 – overview on parallelism • arXiv:1409.2802 – review on far field approximation and
hierarchical matrices
Extra material
Accurate source averaging – far field approximations
Instead of using just average density and center of mass, use several “equivalent” sources.
The original sources are the ‘+’ and ‘-’ in the figure
The equivalent sources placed at the red locations, equispaced on an enclosing circle. The densities are computed by
Least squares problem
Node averaging
The same idea can be used for averaging the equivalent densities of the children of a node.
Single-node
Malhotra, Gholami, B., SC’14 Best Student Paper Finalist
Algorithmic redesign; data structure redesign, blocking and loop reordering, OpenMP + AVX + SSE; prefetching; task parallelism, asynchronous offload
Hardware + algorithms
Malhotra, Gholami, B., SC’14 Best Student Paper Finalist
Fast N-body solver milestones (up to 2000)
Vladimir Rokhlin, Yale Leslie Greengard, Courant Institute
Sequential Appel, 1985 - O(N logN) Barnes & Hut, 1986 - O(N logN) tree code Rokhlin, 1985 - Multipole translations Greengard & Rokhlin, 1987 - FMM O(N) Carrier, Greengard, & Rokhlin - Adaptive FMM O(N) Anderson, 1992 - Equivalent densities Cheng, Greengard & Rokhlin, 1999 - Efficient 3D FMM Hackbush,et al,2003 - Hierachical matrices
Parallel Greengard & Gropp, 1991 - Uniform distributions Warren & Salmon, 1992 - Distributed treecode SH Teng, 1998 - Theory Different Operators Greengard & Strain, 1991 – Gauss transform Rohklin, 1992 – High frequency Helmholtz Michielssen, 1998 – Time-domain Wave, Maxwell Kapur & Long, 1997 – Algebraic view
N-body methods: Scalability requirements
Computational physics/chemistry/biology • large scale – 1E13 unknowns (3D/4D) Graphics/signal analysis • real time – 1E6 unknowns (1D/4D) Data analysis/Uncertainty quantification • large scale – 1E15 unknowns (1D - 1000D) • real time / streaming / online
A few things left out
O(N) Fast multipole methods Parallel implementation Analytic expansions for far-field approximation
(Taylor, Laurent, Spherical Harmonics) Rigorous complexity analysis Oscillatory kernels Periodic and other boundary conditions Ewald sums Gaussian kernels and more general kernels