b5: exascale hardware. capability requirements several different requirements –exaflops/exascale...
TRANSCRIPT
Capability Requirements
• Several different requirements– Exaflops/Exascale single application– Ensembles of Petaflop apps requiring
Exaflops-years– Streaming/Realtime– I/O intensive (e.g., analysis, data mining)
• Not considering capacity
Exaflops are Possible
• Extrapolation of Top500 suggests that 1EF in 2019
• DOE (through ASCI and LCF) has contributed to staying on this trajectory– May require investment to
stay on this trajectory– History shows Federal
investment accelerated top systems
– May not get usable FLOPS (non LINPACK) without investment
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Components of an Exascale System
• Its not just FLOPS. Need– Processors– Interconnect– Memory– I/O (persistent storage)– Connection to the outside world– Balance of these
• Constraints Include– Power– Cooling– Reliability– Adoption by applications, particularly legacy, and including familiar
development environment– Cost :)
Notes on Commodity Design
• Based on Jeff Vetter’s extrapolation of current technology– Details in ORNL presentation
• Does not preserve the performance ratios (e.g., bytes/flop interconnect bandwidth) commonly expected– This is not new; e.g., PC memory/disk size ratios have
changed significantly
• Most (all?) Exascale system designs will mandate some changes in those ratios– R&D can either reduce the change in the ratio or reduce the
impact of the change (e.g., new algorithms)– E.g., more specialized systems may provide better cost/perf
for specific application classes
Issues (concerns)
• There are possible hazards:– Interconnect performance
• Latency, bandwidth
– I/O • Density, bandwidth, fault management
– Memory• Cost, power (and latency and bandwidth)
– Power• 4M PS3 is 1EF but use 1GW
– Latency/bandwidth/faults/concurrency– Software and algorithms
• Workaround/with latency/bandwidth/faults/concurrency
• Non issue - getting the peak FLOPS• All of these can (must) benefit from research and development
investment
Alternate Directions
• Commodity– GPGPU and STI Cell offer very high compute density wrt
commodity CPU– Ex. 4M PS3 = 1EF (single precision)– But
• Not all algorithms can effectively use these systems• Programming complexity (currently) much greater
– Embedded processors (better FLOPS/Watt)
• New Architectures– PIM, FPGA-centric, …
• Not in this time frame– Quantum, molecular, DNA, …
Promising Tech
• Tech that can improve balance (ratios) in system; cost, reliability, etc.
• Optimizing the use of die space for CPU (manycore, multicore, stream, vector, heterogeneous, variable precision arithmetic, etc.)
• Optical network (faster signaling, cheaper/denser connectors)• Optical into/out of the processor• 3-D chips, integrated memory/processor• Faster development of customized processors• Hardware accelerated system verification (e.g., RAMP)• NAND Flash, MRAM, and other non-volatile memory (disk
replacements)• Myriad approaches to power efficiency
Cross Cutting Issues
• Better characterization of algorithm requirements wrt system ratios• New algorithms to match system ratios
– Disk I/O/main memory– Interconnect bandwidth/flops– Etc
• New algorithms/software to detect and handle faults• New approaches to algorithms/software for specialized/disruptive processor
architectures– E.g., good ways to move apps to GPGPUs, PIMs, or FPGAs
• Need to accelerate applications and algorithms (esp. new ones) to PF now to prepare for EF
• Programming Language and Environments– PGAS, Domain-specific, auto-tuner, hierarchical programming models (built on current
models)– Interaction with hardware (e.g., user-managed caches, remote atomic updates, etc.)– Performance modeling and debugging– Productivity etc.– System software, OS (e.g., memory management)
Sample Plan Components
• Point studies for future– Like the Petaflops point designs, with more application/algorithm designer involvement and
include OS. Evaluate time/cost to get apps running on system. Ongoing process; contrast with baseline
• Early simulation and modeling of systems, algorithms, and applications (see open source below) incl hardware (e.g., RAMP), particularly wrt promising technologies
• Evaluate special purpose architectures and non-MPI programming models for application/algorithm classes (cheaper, faster, better)
• Partnerships for disruptive technologies– Need to understand timeline and costs– Goal is to accelerate; not required for Exaflops
• Directed vendor partnerships– QCDOC is a good example
• Support application involvement from the beginning – WRT point designs, with performance understanding– Must encourage new apps to increase community size
• Some Principles– Open source– Support multiple prototypes (at suitable scale)– Establish a framework to move from point studies to full systems through multiple stages