protein explorer: a petaflops special purpose computer system for molecular dynamics simulations
DESCRIPTION
Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations. David Gobaud Computational Drug Discovery Stanford University 7 March 2006. Outline. Overview Background Delft Molecular Dynamics Processor GRAPE Protein Explorer Summary MDGRAPE-3 Chip - PowerPoint PPT PresentationTRANSCRIPT
Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations
David GobaudComputational Drug Discovery
Stanford University7 March 2006
Outline Overview Background Delft Molecular Dynamics Processor GRAPE Protein Explorer Summary MDGRAPE-3 Chip
Force Calculation Pipeline J-Particle Memory and Control Units
System Architecture Software Cost Questions
Overview Protein Explorer
Petaflop special-purpose computer system for molecular dynamics simulations
High-precision screening for drug design Large-scale simulations of huge proteins/complexes
PC cluster with special-purpose engines to perform the most time-consuming calculations
Dedicated LSI MDGRAPE-3 chip performs force calculations at 165 Gflops or higher
ETA 2006
Background PCs are universal machines
Various applications Hardware can be designed independent of
applications Obstacles to high-performance
Memory bandwidth bottleneck Heat dissipation problem Can be overcome by developing specialized
architectures
Delft Molecular Dynamics Processor (DMDP) Pioneered high-performance special-
purpose systems Not able to achieve effective cost-
performance Demanded too much time and money in
development state Speed of development is a crucial factor affecting
cost-performance because electronic device technology continues to develop rapidly
Almost all calculations performed by DMDP making hardware very complex
GRAPE (GRAvity PipE) One of the most successful attempts to
develop high-performance special-purpose systems
Specialized for simulations of classical particles
Most time spent on calculation of long-range forces (gravitational, Coulomb, and van der Waals) Thus special hardware only performs these
calculations Hardware very simple and cost-effective
GRAPE (GRAvity PipE) In 1995 first machine to break teraflops
barrier in nominal peak performance Since 2001 leader in performance has
been Molecular Dynamics Machine at RIKEN at 78-TFlops
2002 @ University of Tokyo a 64-TFlop GRAPE-6 completed
Protein Explorer launched based on 2002 University of Tokyo success
Protein Explorer Summary Host PC cluster with special purpose boards
attached Boards calculate only non-bounded forces
Very simple hardware and software No detailed knowledge of hardware needed to write
programs Communication time between host and boards
is proportional to number of particles Calculation time proportional to
N^2 for direct summation of long-range forces N*Nc for short range forces where Nc is the average
number of particles within the cutoff radius 0.25 byte/1000 operations
MDGRAPE-3 Chip - Force Calculation Pipeline
3 subtractor units 6 adder units 8 multiplier units 1 function-evaluation unit Can perform ~33 equivalent
operations/sec when it calculates the Coulomb force
MDGRAPE-3 Chip - Force Calculation Pipeline
MDGRAPE-3 Chip - Force Calculation Pipeline Most operations done in 32-bit single
precision floating point format Force accumulation is 80-bit fixed point
format Can be converted to 64-bit double precision
floating point Coordinates stored in 40-bit fixed-point
format Makes implementation of periodic boundary
condition easy
MDGRAPE-3 Chip - Force Calculation Pipeline Function Evaluator
Most important part of pipeline Allows calculation of arbitrary smooth function Has memory unit which contains a table for
polynomial coefficients and exponents and a hardwired pipeline for fourth-order polynomial evaluation
Interpolates an arbitrary smooth function g(x) using segmented fourth-order polynomials by Homer’s method
MDGRAPE-3 Chip - J-Particle Memory and Control Units 20 Force Calculation Pipelines j-Particle Memory Unit
32,768 bodies “Main Memory” 6.6 Mbits constructed by static RAM
Cell-Index Controller Controls j-Particle memory – generates
addresses Force Simulation Unit Master Controller
Manages timings and inputs/outputs of the chip
MDGRAPE-3 Chip 2 virtual pipelines/physical pipeline Physical bandwidth of j-particle unit
2.5 Gbytes/sec but virtual bandwidth will reach 100 Gbytes/sec
340 arithmetic units 20 function-evaluator units which
work simultaneously 165 Gflops at 250MHz
MDGRAPE-3 Chip
MDGRAPE-3 Chip Chip made by Hitachi 6M gates 10M bits of memory Chip size is ~220 mm^2 Dissipate 20 watts at core voltage
of +1.2V .12 W/Gflops much better than P4
3GHz which is 14 W/Gflop
System Architecture Host PC cluster will use Itanium or Opteron CPU 256 nodes with 512 CPUs each Performance of node is 3.96 Tflops
Total reaches a petaflop Require 10G-bit/sec network
Infiniband 10G Ethernet or future Myrinet Network topology will be a 2D hyper-crossbar Each node has 24 MDGRAPE-3 chips MDGRAPE-3 chips connected via 2 PCI-X busses at 133
MHz 19” rack can house 6 nodes
43 racks total Power dissipation ~150 KWatts Occupy 100 m^2
System Architecture
Protein Explorer Board
Software
Very easy to create programs for All computational abilities provided
in a library No special knowledge of device
needed
Cost
$20 million including labor Less than $10/Gflop
At least ten times better than general-purpose computers even when compared with relatively cheap BlueGene/L ($140/Gflop)
Questions What is Myrinet? What is a two-dimensional hyper-
crossbar network topology? How does this compare to massive
distributed computing such as Folding@Home Advantages? Disadvantages?