load balancing in distributed n-body simulations
Post on 31-Dec-2015
27 Views
Preview:
DESCRIPTION
TRANSCRIPT
N-Body Simulations
A simulation of a dynamic system of particles under the interaction of a distance mediated force
For instance, a simulation of the stars within a cluster or galaxy under the force of gravity
http://upload.wikimedia.org/wikipedia/commons/2/25/Galaxy_collision.ogv
Barnes-Hut Algorithm
N-Body simulations can be computed by direct integration
For each particle, calculate the interaction with every other particle
Running time is O (n2)
There are many efficient algorithms for N-Body Simulations
Barnes-Hut algorithm is based on treating groups of distant particles as a single entity
Running time is O (n log n)
Barnes-Hut Cont. An oct-tree is constructed to contain the particles
Each node corresponds to a segment of simulation space
The root contains the entire simulation space
The children of each node subdivide the space of the node into 8 equally sized cubic segments
The nodes of any layer are non-overlapping
Each leaf holds 1 particle
Each non-leaf stores the mass and center of gravity of all the particles stored by its children
Forces are calculated between a particle and a tree node
Let L be the length of the node and D be the distance between the node's center of gravity and the particle.
If L/D < 1 then calculate the force the node exerts on the particle
Otherwise, compute the interaction between the particle and the 8 children of the node
Distributed Barnes-Hut
Naïve N-Body Simulations do not require load balancing
Equal computation required for every particle
But in Barnes-Hut
Particles in high density areas require more computations than particles in low density areas.
Nearby particles are treated as individuals, but far away particles are calculated as groups
Particles move during the simulation so a good partitioning at the start may become a poor partitioning by the end of the simulation
Data-Shipping vs. Function Shipping
Each process constructs a local tree for the particles it controls
Interactions must be computed for particles which reside on other processes
Two approaches
Data shipping: each process requests enough of the tree of every other process to compute interactions between its particles and remote process.
Function shipping: A list of particles is sent to every other process to compute
Hybrid: Processes share some information about their trees, and use function shipping otherwise
Static Partitioning, Static Assignment
The simulation space is broken into k * N equally sized segments
Each process is statically assigned k pieces and is responsible for all particles within its segments
Particles may transition between processes as they move
Relies on distributed segments to overcome load imbalance
Gives up some locality to achieve balance
0 1 0 1
2 3 2 3
0 1 0 1
2 3 2 3
Static Partitioning, Dynamic Assignment
The simulation space is broken into k * N equally sized segments
A load is calculated for each segment based on the number of calculations done in the last step
Each process is dynamically assigned contiguous pieces and is responsible for all particles within its segments
Uses a Morton ordering or Z curve to maximize adjacent segments that are physically contiguous
Improves locality and load balancing over static assignment, but at increased cost
0 0 1 2
0 1 2 2
1 1 2 2
1 1 3 3
Dynamic Partitioning, Dynamic Assignment
Processes coordinate to construct a combined tree
Each node contains the load experienced during the last step
Each process does a walk through the tree claiming nodes up to its share of the total load
When a process has claimed its share of the load, it signals the next process where it left off and the next process begins its walk from that point
0 0 2 2
00 0 2 3
31 1 3 3
1 1
31 1
21 2
K-Means Clustering
Simulation space is divided into N clusters using a K-Means clustering algorithm
Ensures that close particles are assigned to the same process regardless of where they are in simulation space
Centroids and cluster assignments are recomputed each step
The distance function for each centroid is scaled based on the load of the cluster during the last step
top related