trends in high performance computing, and their impact on ...trends in high performance computing,...
TRANSCRIPT
![Page 1: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/1.jpg)
GRITS — 2010/06/11
Trends in High Performance Computing, and their Impact on Astrophysical Data Processing
Theodore KisnerComputational Cosmology Center, LBNL
![Page 2: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/2.jpg)
Theodore Kisner — GRITS — 2010/06/11
C3 at LBNL
● Focused on computational challenges (simulation and data processing) relevant to cosmology (CMB, SN, BAO, ...)
● Tight connection to DOE computing facilities: Cray XT5 (40K cores), Cray XE6 (150K cores), Cloud computing platform, GPU test cluster, science gateways, etc.
● For >10 years, we have coordinated CPU allocations for CMB telescopes (funded by NASA, NSF, etc).
● Involved in building software infrastructure for future experiments and future architectures: algorithm scaling, data management, etc.
![Page 3: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/3.jpg)
Theodore Kisner — GRITS — 2010/06/11
High Performance Computing
For the purposes of this talk, everything that needs a machine room:● Traditional Clusters (PCs interconnected with
ethernet, infiniband, etc)● Supercomputers (lightweight nodes with
infiniband or custom interconnect)● Cloud computing platforms (EC2, Eucalyptus)● Large shared memory machines (NUMA
architectures)
![Page 4: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/4.jpg)
Theodore Kisner — GRITS — 2010/06/11
HPC in 10 Years
Hard to predict, but driven by trends:● Still using silicon, and still tracking Moore's
law for transistor counts.● Computing centers have limited electrical
capacity for power and cooling.● Packing transistors into traditional CPU cores
requires even more transistors for “overhead” diminishing returns.
● Market forces (follow the money)
![Page 5: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/5.jpg)
Theodore Kisner — GRITS — 2010/06/11
Moore's Law
Figure by Kathy Yelick, data from Kunle Olukotun, Lance Hammond,Herb Sutter, Burton Smith, Chris Batten, and Krste Asanovic
![Page 6: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/6.jpg)
Theodore Kisner — GRITS — 2010/06/11
Rise of Manycore SystemsFocus is on Flops per Watt:
● Clock rates constant or decreasing.
![Page 7: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/7.jpg)
Theodore Kisner — GRITS — 2010/06/11
Clock Rates and Power Scaling
● IBM Power5: 120W @ 1900MHz
● Intel Core2 solo: 15W @ 1000MHz.
● IBM PPC 450 (Blue Gene): 0.625W @ 800MHz
● Tensilica XTensa (Moto Razor): 0.09W @ 600MHz
400x improvement in Flops per Watt!
Image by John Shalf, LBNL
![Page 8: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/8.jpg)
Theodore Kisner — GRITS — 2010/06/11
Rise of Manycore SystemsFocus is on Flops per Watt:
● Clock rates constant or decreasing.● Use larger fraction of transistors for
calculation, split into many “throughput” cores.● Explicit memory hierarchy. Cache
management now in software stack. RAM/node , but RAM/core
![Page 9: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/9.jpg)
Theodore Kisner — GRITS — 2010/06/11
CPU Power Consumption
![Page 10: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/10.jpg)
Theodore Kisner — GRITS — 2010/06/11
Rise of Manycore SystemsFocus is on Flops per Watt:
● Clock rates constant or decreasing.● Use larger fraction of transistors for calculation,
split into many “throughput” cores.● Explicit memory hierarchy. Cache management
now in software stack. RAM/node , but RAM/core
● Keep some traditional “low latency” cores around for coordination.
● Filesystem I/O even more of a bottleneck...
![Page 11: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/11.jpg)
Theodore Kisner — GRITS — 2010/06/11
“Throughput” Processors
● NVIDIA Fermi: 480 cores @ 700MHz● ATI Radeon 5970: 3200 cores @ 725MHz● Intel Many Integrated Core (MIC): rebrand
of failed Larrabee platform...● Goal is to use something closer to 25% of
transistors for Flops.● Requires finegrained parallelism, explicit
memory movement.
![Page 12: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/12.jpg)
Theodore Kisner — GRITS — 2010/06/11
What does this mean for Astrophysics?
● Astrophysical datasets are getting larger!● LSST: 15TB / day● Nearterm CMB missions: O(1001000) TB
● Systems in the very near future may have O(10) traditional cores and O(1001000) throughput cores per node.
1. Data movement can be more costly than calculations minimize when possible.
2. Determine what operations can be parallelized at the node level.
3. Evaluate new tools as they become available.
![Page 13: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/13.jpg)
Theodore Kisner — GRITS — 2010/06/11
Data Movement
● Traditional paradigm: ● Many small executables chained together● Write / read intermediate files
● This breaks down if:● I/O cost outpaces calculation AND● Overall runtime is unacceptably slow
● Movement to/from accelerators can also cancel benefit for some algorithms.
![Page 14: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/14.jpg)
Theodore Kisner — GRITS — 2010/06/11
Data Movement for “Chained Processes”
Disk RAM GPU
Raw Data
Temporary Data
Final Data
Temporary Data
Process 1
Process 3
Process 2
This is for playingQuake, right?
SLOW FAST
![Page 15: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/15.jpg)
Theodore Kisner — GRITS — 2010/06/11
Improved Data Movement
Disk RAM GPU
Raw Data
Final Data
LibraryFunction 1
LibraryFunction 2
Accelerated Task 1
Custom Kernel 2
Custom Kernel 3
SLOW FAST
FR
EE
FR
EE
![Page 16: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/16.jpg)
Theodore Kisner — GRITS — 2010/06/11
Parallelize Relevant Operations
● Split processing based on independent data products (embarrassingly parallel work flows)
● 1D – time domain astrophysics:● vector math, FFTs, sparse matrix operations.
● 2D – image / map manipulation● Linear combinations, projections● convolution / filtering, spherical harmonic transforms
● 3D – data cube (spaxel/voxel) manipulations.
![Page 17: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/17.jpg)
Theodore Kisner — GRITS — 2010/06/11
Parallelize Relevant Operations
● Start by converting/switching lowlevel libraries
● Likely to get some improvement without much work, e.g. FFT libraries.
● Only build custom code when needed if data movement to/from card is dominant.
● Use helper tools: PGI accelerator framework, MOAT (shameless plug!).
![Page 18: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/18.jpg)
Theodore Kisner — GRITS — 2010/06/11
New Tools
● We are faced with a huge diversity of platforms: GPUs/accelerators from different vendors, varying OS support.
● OpenCL: Unified interface to CPU/GPU devices, wide industry support.
![Page 19: Trends in High Performance Computing, and their Impact on ...Trends in High Performance Computing, and their Impact on ... , Cray XE6 (150K cores), Cloud computing platform, GPU](https://reader035.vdocuments.us/reader035/viewer/2022081611/5f038a237e708231d4098e3c/html5/thumbnails/19.jpg)
Theodore Kisner — GRITS — 2010/06/11
Conclusions
1. Start planning now for future hardware: will your code be ready for the cluster you purchase in 3 years?
2. Start testing new software tools that seem promising what pieces of existing code are easy to parallize?
3. Will your future data volume overwhelm your current I/O patterns?