resources - colorado school of minesgeco.mines.edu/workshop/frcrc13/otherresources12.pdf ·...

40
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. [email protected] 1

Upload: dinhquynh

Post on 12-May-2018

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

ResourcesCurrent and Future Systems

Timothy H. Kaiser, [email protected]

1

Page 2: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Most likely talk to be out of date

• History of Top 500

• Issues with building bigger machines

• Current and near future academic machines

2

Page 3: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Top 500 list

• Ranks computers based on performance on a linear solve

• http://www.top500.org/

3

Page 4: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Top500 Benchmarks Spring 13

4

Page 5: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Trends

5

Page 6: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

BlueMMines`

Supercomputer154 Tflops 17.4 Tbytes 10,496 Cores 85 KW

Five Racks (not full)

• dual architecture

• Two Distinct Compute Units

iDataPlex

Blue Gene Q

• Best of both worlds

• Shared 480 Tbyte File System

• Compact

• Low Power Consumption

Page 7: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

BlueM’s Compute Units - AuNAuN (Golden)

iDataPlex

Intel 8x2 core SandyBridge

144 Nodes

2,304 Cores

9,216 Gbytes

50 Tflops

FeatureLatest Generation Intel Processors

Large Memory / Node

Common architecture

Similar user environment to RA and Mio

Quickly get researchers up and running

Page 8: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

BlueM’s Compute Units - MC2

Feature

New Architecture

Designed for large core count jobs

Highly scaleable

Multilevel parallelism - Direction of HPC

Room to Grow

Future looking machine

MC2 (Energy)

Blue Gene Q

PowerPCA2 17 Core

512 Nodes

8,192 Cores

8,192 Gbytes

104 Tflops

Page 9: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Colorado State

9

•Cray  model  XT6m•Opera1onal  January  2011•Peak  performance  12  teraflops•Dimensions:  7.5  E.  (h)  x  2.0  E.  (w)  x  4.5  E.  (d)  •Computer  par11on•  52  compute  nodes•  2  processors  /  node•  104  AMD  Magny  Cours  64-­‐bit  1.9  GHz  total  processors  •  12  cores  /  processor;  1,248  total  cores

•  32  GB  DDR3  ECC  SDRAM  /  node;  1.664  TB  total  RAM

Update:2016 Cores20 Tflops

Page 10: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

10

NCAR's Computational and Information Systems Laboratory (CISL) invites NSF-supported university researchers in the atmospheric, oceanic, and closely related sciences to submit large allocation requests by September 17, 2012.

https://www2.cisl.ucar.edu/docs/allocations#University

University researchers supported by an NSF award can request up to 30,000 GAUs as a Small Allocation request. Up to 10,000 GAUs are available to graduate students and post-docs; no NSF award is required.

Page 11: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

NCAR & CISL systems

11

• Yellowstone — A 1.5-petaflops high-performance computing system with 72,288 processor cores and 144 terabytes of memory. Production computing operations will begin in the summer of 2012.

• Bluefire — NCAR's 77-teraflops IBM Power6 system used by the Climate Simulation Lab (CSL) and Community Computing Facilities.

• Janus — The Janus system is a Dell Linux cluster that is housed on the CU-Boulder campus and has a high-speed networking connection to NCAR's computing and data storage systems.

• Lynx — A Cray XT5m system deployed as a testing platform and available to NCAR users.

• Mirage and Storm —CISL operates two data analysis and visualization clusters, with software packages including NCL, Vapor, Matlab and IDL, for its user community.

• GLADE —The central GLADE file system significantly expands the disk space available to CISL users and allows users to access their data from both HPC and DAV systems.

• HPSS — CISL has migrated its archival storage resource to the High-Performance Storage System (HPSS) environment, which currently stores more than 12 PB of data in support of CISL computing facilities and NCAR research activities.

Page 12: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

12

Page 13: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

13

2

NCAR%Resources%!!at!the!NCAR!Wyoming!Supercompu6ng!Center!(NWSC)!

•  Centralized!Filesystems!and!Data!Storage!(GLADE)!–  >90!GB/sec!aggregate!I/O!bandwidth,!GPFS!filesystems!–  10.9!PetaBytes!iniJally!K>!16.4!PetaBytes!in!1Q2014!

•  High!Performance!CompuJng!(Yellowstone)!–  IBM!iDataPlex!Cluster!with!Intel!Xeon!E5K2670†!processors!with!Advanced!Vector!

Extensions!(AVX)!–  1.50!PetaFLOPs!–!28.9!bluefireKequivalents!–!4,518!nodes!–!72,288!cores!–  145!TeraBytes!total!memory!–  Mellanox!FDR!InfiniBand!full!fatKtree!interconnect!

•  Data!Analysis!and!VisualizaJon!(Geyser!&!Caldera)!–  Large!Memory!System!with!Intel!Westmere!EX!processors!

•  16!nodes,!640!WestmereKEX!cores,!16!TeraBytes!memory,!16!nVIDIA!Quadro!6000!GPUs!–  GPUKComputaJon/Vis!System!with!Intel!Sandy!Bridge!EP!processors!with!AVX!

•  16!nodes,!256!E5K2670!cores,!1!TeraByte!memory,!32!nVIDIA!M2070Q!GPUs!–  Knights!Corner!System!with!Intel!Sandy!Bridge!EP!processors!with!AVX!

•  16!Knights!Corner!nodes,!256!E5K2670!cores,!>1600!KC!cores,!1!TB!memory!–!Early!2013!deliver!•  NCAR!HPSS!Data!Archive!

–  2!SL8500!Tape!libraries!(20k!cartridge!slots)!@!NWSC!–  >100!PetaByte!capacity!(with!5!TeraByte!cartridges,!uncompressed)!–  2!SL8500!Tape!libraries!(15k!slots)!@!Mesa!Lab!(current!16!PetaByte!archive)!

†!Codenamed!“Sandy!Bridge!EP”!

Page 14: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Yellowstone Compute

14

• 72,288 processor cores◦ 2.6-GHz Intel Sandy Bridge EP with Advanced Vector

Extensions (AVX)◦ 8-flops clock

• 4,518 nodes◦ IBM dx360 M4, dual socket, 8 cores per socket

• 144.58 TB total system memory◦ 2 GB/core, 32 GB/node, DDR3-1600

• FDR Mellanox InfiniBand interconnect ◦ Full fat tree, single plane◦ Bandwidth 13.6 GBps bidirectional per node; latency 2.5 µs

◦ Peak bidirectional bisection bandwidth: 31.7 TBps• 1.504 petaflops peak◦ 1.20 petaflops estimated HPL

Page 15: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

XSEDE

• Extreme Science and Engineering Discovery Environment

• https://www.xsede.org/

• Mostly the same people as TeraGrid

• Mostly the same machines

15

Page 16: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

16

XSEDE Machines User Guides:

https://www.xsede.org/user-guides

Page 17: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Future Directions in HPC• Four important concepts that will effect math software - Jack

Dongarra

• Effective use of many-core

• Exploiting mixed precision in our numerical computations

• Self adapting / auto tuning of software

• Fault tolerant algorithms

• Barriers to progress are increasingly on the software side.

• Hardware has a half-life measured in years, while software has a half-life measured in decades.

• High performance ecosystem out of balance: HW, SW, OS, Compilers, Algorithms, Applications.

17

Page 18: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Top500 Benchmarks Spring 13

18

Page 19: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Trends• Hardware

• Large number of cores

• Less memory per core

• More Flops/Watt

• Better Interconnect

• Software

• Hybrid programming

• Directives based

19

Page 20: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

GPU

• GPU computing is the use of a GPU (graphics processing unit) together with a CPU to accelerate general-purpose scientific and engineering applications. GPUs do “real” computation

• Vendors have taken GPU systems and repackaged them to do computation

• Vendors

• IBM• AMD• Nvidia• Intel

20

NVidia Tesla M2090 GPU

Page 21: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Not a completely new concept

• Think coprocessor

• Main processor passes off some work to coprocessor

• Remember the 8087?

• Same issues

• Programs must be written to take advantage

• Getting data to/from coprocessor

21

Page 22: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Programming (Bottom Level)

• Program is written in two parts

• CPU

• GPU

• Computation starts on CPU

• Data is “prepared” on CPU

• Data and Program (subroutine) are sent to GPU

• Subroutine run on GPU as a thread

• Data is sent back to CPU

22

Page 23: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Issues

• Complexity

• Separate code for GPU

• Easy to write tough to get to run well

• Bottleneck between CPU and GPU

• Mixed precision

• Efficiency on the GPU

• Small amount of fast memory

• Massive number of threads must be managed

23

Page 24: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

GPUs

24

Many more coresDoes not support normal process

Expected to run multiple threads per coreVery small fast memory

MUCH less memory per core

Page 25: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Issues• Complexity

• Directives based programming similar to OpenMP

• Libraries

• Bottleneck between CPU and GPU

• Getting Better

• Mixed precision

• (Some) newer GPUs have better ratio of performance

• Efficiency on the GPU

• More memory and flatter hierarchy

• Better thread management25

Page 26: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

CSM’s old GPU node (2009)

26

# of Tesla GPUs 4# of Streaming Processor Cores 960 (240 per processor)Frequency of processor cores 1.296 to 1.44  GHzSingle Precision floating point performance (peak) 3.73 to 4.14 TFlopsDouble Precision floating point performance (peak) 311 to 345 GFlopsFloating Point Precision IEEE 754 single & doubleTotal Dedicated Memory 16 GB Memory Interface 512-bitMemory Bandwidth 408 GB/secMax Power Consumption 800 WSystem Interface PCIe x16 or x8Software Development Tools C-based CUDA Toolkit

Page 27: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Today’s NVIDA offerings

27

Page 28: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Intel Many Integrated Core (MIC)

28

http://openlab.web.cern.ch/publications/presentations?page=1

• What?

• Many (>50) cores on a chip

• Each core is x86 type processor

• Why?

• Massive parallelizm

• Same (MoL) instruction set as other X86

• When?

• Knights Corner Prerelease product PCI card

• Available very soon as Xeon Phi, also PCI card

Page 29: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Intel MIC differences• X86 instruction set

• Can in theory, run full os on the card

• Should most likely run threads (OpenMP)

• Uses the same compilers as normal Intel processors

• Codes optimized for current generation processor will run well on MIC

• Threading

• Vectorization

29

Page 30: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Summary

30

• Core count is going up

• Memory / core is going down

• Threading will become more important

• Hybrid will be critical

Page 31: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

31

Page 32: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

Next few slides taken from

Dr. Jay Boisseau Director of TACC

32

Page 33: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

33

MIC Architecture • Many cores on the die • L1 and L2 cache • Bidirectional ring network • Memory and PCIe connection

Knights Ferry SDP • Up to 32 cores • 1-2 GB of GDDR5 RAM • 512-bit wide SIMD registers • L1/L2 caches • Multiple threads (up to 4) per core • Slow operation in double precision

Knights Corner (first product) • 50+ cores • Increased amount of RAM • Details are under NDA • Double precision half the speed of

single precision (canonical ratio) • 22 nm technology

MIC (KNF) architecture block diagram

Page 34: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

34

What we at TACC like about MIC (and we think that you will like this, too)

• Intel’s® MIC is based on x86 technology – x86 cores w/ caches and cache coherency – SIMD instruction set

• Programming for MIC is similar to programming for CPUs – Familiar languages: C/C++ and Fortran – Familiar parallel programming models: OpenMP & MPI – MPI on host and on the coprocessor – Any code can run on MIC, not just kernels

• Optimizing for MIC is similar to optimizing for CPUs – Make use of existing knowledge! Key elements of this talk

highlighted!

Page 35: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

35

Coprocessor vs. Accelerator • Differences

– Architecture: x86 vs. streaming processors coherent caches vs. shared memory and caches – HPC Programming model: extension to C++/C/Fortran vs. CUDA/OpenCL OpenCL support Threading/MPI: OpenMP and Multithreading vs. threads in hardware MPI on host and/or MIC vs. MPI on host only – Programming details offloaded regions vs. kernels – Support for any code: serial, scripting, etc.

Yes No

• Native mode: Any code may be “offloaded”  as a whole to the coprocessor

Page 36: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

36

Programming Models

Ready to use on day one!

• TBB’s  will  be  available  to  C++  programmers

• MKL will be available – Automatic offloading by compiler for some MKL features

• Cilk Plus – Useful for task-parallel programing (add-on to OpenMP) – May become available for Fortran users as well

• OpenMP – TACC expects that OpenMP will be the most interesting

programming model for our HPC users

Page 37: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

IBM Blue Gene Q

• New machine from IBM

• Evolution from BGL and BGP

• Many cores / node with less memory / core but more than “L” or “P”

• Very energy efficient

• 4 of the top 8 on top 500 list

37

Page 38: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

BGQ Rack

• 208 Tflop

• 62.5 kW

• 1 rack

• 1024 nodes

• 16384 cores

• 1 node = 16+1 cores

• 16 Gbytes or 1Gbyte/core

• Footprint < 31 ft2

38

Page 39: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

BGQ Proprietary Parts

• Processors Designed for HPC

• 4 threads/core

• Advanced speculative operation

• Transactional memory

• Networks

• 5D torus

• Collective and barrier

• Floating point addition in network

• Special IO Nodes39

Page 40: Resources - Colorado School of Minesgeco.mines.edu/workshop/frcrc13/OtherResources12.pdf · Resources Current and Future ... • Janus — The Janus system is a Dell Linux cluster

5D Torus

40

What the?