1 namd - scalable molecular dynamics gengbin zheng 9/1/01

1

NAMD - Scalable Molecular Dynamics

Gengbin Zheng

9/1/01

2

Molecular dynamics and NAMD

• MD to understand the structure and function of biomolecules– proteins, DNA, membranes

• NAMD is a production quality MD program– Active use by biophysicists (science publications)– 50,000+ lines of C++ code– 1000+ registered users– Features and “accessories” such as

• VMD: visualization and analysis• BioCoRE: collaboratory• Steered and Interactive Molecular Dynamics

3

Molecular Dynamics

4

Molecular Dynamics• Collection of [charged] atoms, with bonds• Like N-Body problem, but much complicated.• At each time-step

– Calculate forces on each atom • non-bonded: electrostatic and van der Waal’s

• Bonds(2), angle(3) and dihedral(4)

– Integration: calculate velocities and advance positions

• 1 femtosecond time-step, millions needed!• Thousands of atoms (1,000 - 100,000)

5

Cut-off radius• Use of cut-off radius to reduce work

– 8 - 14 Å– Far away charges ignored!

• 80-95 % work is non-bonded force computations• Some simulations need far away contributions

– Periodic systems: Ewald, Particle-Mesh Ewald– Aperiodic systems: FMA

• Even so, cut-off based computations are important:– near-atom calculations are part of the above– Cycles: multiple time-stepping is used: k cut-off steps,

1 PME/FMA

6

Spatial Decomposition

But the load balancing problems are still severe:

Patch

7

Patch

Compute

Proxy

8

FD + SD

• Now, we have many more objects to load balance:– Each diamond can be assigned to any processor– Number of diamonds (3D):

• 14·Number of Patches

9

Load Balancing

• Is a major challenge for this application– especially for a large number of processors

• Unpredictable workloads– Each diamond (force object) and patch encapsulate

variable amount of work

– Static estimates are inaccurate

• Measurement based Load Balancing Framework– Robert Brunner’s recent Ph.D. thesis

– Very slow variations across timesteps

10

Load Balancing

• Based on migratable objects• Collect timing data for several cycles• Run heuristic load balancer

– Several alternative ones:• Alg7 - Greedy• Refinement

• Re-map and migrate objects accordingly– Registration mechanisms facilitate migration

11

Load balancing strategyGreedy variant (simplified):

Sort compute objects (diamonds)

Repeat (until all assigned)

S = set of all processors that:

-- are not overloaded

-- generate least new commun.

P = least loaded {S}

Assign heaviest compute to P

Refinement:

Repeat

- Pick a compute from

the most overloaded PE

- Assign it to a suitable

underloaded PE

Until (No movement)

Cell CellCompute

12

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

Processors

Tim

e migratable work

non-migratable work

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

0 2 4 6 8 10 12 14

Avera

ge

Processors

Tim

e migratable work

non-migratable work

13

Results on Linux ClusterSpeedup on Linux Cluster

0

10

20

30

40

50

60

70

80

0 20 40 60 80 100 120

Processors

Sp

eed

up

14

Performance of Apo-A1 on Asci Red

0

200

400

600

800

1000

1200

0 500 1000 1500 2000 2500

Processors

Sp

eed

up

15

Performance of Apo-A1 on O2k and T3E

0

50

100

150

200

250

0 50 100 150 200 250 300

Processors

Sp

eed

up

16

Future and Planned work

• Increased speedups on 2k-10k processors– Smaller grainsizes

– New algorithms for reducing communication impact

– New load balancing strategies

• Further performance improvements for PME/FMA– With multiple timestepping

– Needs multi-phase load balancing

17

Steered MD: example picture

Image and Simulation by the theoretical biophysics group, Beckman Institute, UIUC

1 namd - scalable molecular dynamics gengbin zheng 9/1/01

Documents

patch slide

pmefma slide

migration slide

t3e slide

timesteps slide

compute proxy slide

asci red slide

linux cluster slide