dod hpc modernization program & move toward emerging architectures tom dunn naval meteorology...
TRANSCRIPT
DoD HPC Modernization Program & Move Toward Emerging Architectures
Tom DunnNaval Meteorology & Oceanography Command
20 November 2014
HPC RECENT TRENDS Per Top500 List
RECENT 2014 DOD ACQUISITIONS
EXPECTED PROCESSOR COMPETITION
ONWARD TOWARD EXASCALE
2
Estimates Follow Moore’s Law (~2x every 2 yrs)
1997 – .3 TFs 2012 (Dec) – 954 TFs2001 – 8.4 TFs 2014 (Jul) – 2,556 TFs 2004 - 32 TFs 2015 (Jul) – 5,760 TFs est. 2006 – 58 TFs 2017 (Jul) –10,000 TFs est.2008 – 226 TFs
Navy DoD SUPERCOMPUTING RESOURCE CENTER
Peak Computational Performance (Teraflops)
3
Navy DSRC Capabilities
• One of the most capable HPC centers in the DoD and the nation
• Chartered as a DoD Supercomputing Center in 1994
• Computational performance approximately doubles every two years; Currently 2,556 Teraflops
• Systems reside on the Defense Research and Engineering Network (DREN) with 10 Gb connectivity – 19 Dec 2013
• 15% of Navy DSRC’s computational and storage capacity reserved for CNMOC activities operational use
• R&D and CNMOC Ops are placed in separate system partitions and queues
4
Top500® Systems by Architecture, June 2006–June 2014
5
Number of CPUs in the Top500® Systems by Architecture Type, June 2006–June 2014
6
Number of Systems in the Top500® Utilizing Co-Processors or Accelerators, June 2009–June 2014
7
Number of Systems in the Top500® by Co-Processors or Accelerators Type, June 2009–June 2014
8
Number of Cores in the Top500® by Co-Processors or Accelerators Type, June 2011–June 2014
9
Number of Cores in the June 2014 Top500® by CPU Manufacturer
JUN 2014
10
11
TOP 500 SUPERCOMPUTER LIST (JUNE 2014)BY OEM Supplier
TOP 500
CRAY INC 51
DELL 8
HEWLETT PACKARD 182
IBM 176
SGI 19
TOTAL 436
Other Suppliers 64
High Performance Computing Modernization Program 2014 HPC Awards
Feb. 2014
Air Force Research Lab (AFRL) DSRC, Dayton, OH Cray XC-30 System (Lightning)
- 1281 teraFLOPS- 56,880 Compute Cores (2.7 GHz Intel Ivy Bridge)- 32 NVIDIA Tesla K40 GPGPUs
Navy DSRC, Stennis Space Center, MSCray XC-30 (Shepard)
- 813 teraFLOPS- 28,392 Compute Cores (2.7 GHz Intel Ivy Bridge)
- 124 Hybrid nodes, each consisting of 10 Ivy Bridge cores and a 60 core Intel Xeon 5120D Phi- 32 NVIDIA Tesla K40 GPGPUs
Cray XC-30 (Armstrong)- 786 teraFLOPS- 29,160 Compute cores (2.7 GHz Intel Ivy Bridge)- 124 Hybrid nodes, each consisting of 10 Ivy Bridge cores and a 60 core Intel Xeon 5120D Phi
12
High Performance Computing Modernization Program 2014 HPC Awards
September 2014
Army Research Lab (ARL) DSRC, Aberdeen, MD
Cray XC-40 System- 3.77 petaFLOPS- 101,312 compute cores (2.3 GHz Intel Xeon Haswell)- 32 NVIDIA Tesla K40 GPGPUs- 411 TB memory- 4.6 PB storage
Army Engineer Research Development Center (ERDC) DSRC, Vicksburg, MS
SGI ICE X System- 4.66 petaFLOPS- 125,440 compute cores (2.3 GHz Intel Xeon Haswell)- 32 NVIDIA Tesla K40 GPGPUs- 440 TB memory- 12.4 PB storage
13
High Performance Computing Modernization Program 2014/2015 HPC Awards
Air Force Research Lab (AFRL) DSRC, Dayton, OH
FY15 Funded
OEM and Contract Award - TBD- 100,000+ compute cores- 3.5 – 5.0 petaFLOPS
Navy DSRC, Stennis Space Center, MS
FY15 Funded
OEM and Contract Award - TBD- 100,000+ compute cores- 3.5 – 5.0 petaFLOPS
14
ECMWF (Top 500 List Jun 2014)
2 Cray XC30 Systemseach with 81,160 compute cores (2.7 GHz Intel Ivy Bridge)1,796 teraFLOPS
NOAA NWS/NCEPWeather & Climate Operational Supercomputing System (WCOSS)
Phase I 2 IBM iDataplex systemseach with 10,048 compute cores (2.6 GHz Intel Sandy
Bridge)213 teraFLOPS
Phase II (Jan 2015) Addition2 IBM NeXtScale systemseach with 24,192 compute cores (2.7GHz Intel Ivy Bridge)585 teraFLOPS
15
UK Meterological Office
IBM Power 7 System18,432 compute cores (3.836 GHz)565 teraFLOPS
IBM Power 7 System15,360 compute cores (3.876 GHz)471 teraFLOPS
----------------------------------------------------------------------------------------------------27 Oct 2014 Announcement
128M Contract2 Cray XC-40 systems (Intel Xeon Haswell initially)>13 times faster than current systemtotal of 480,000 compute cores
Phase 1a replace Power 7s by Sep 2015
Phase 1b extend both systems to power limit by Mar 2016
Phase 1c add one new system by Mar 2017
16
Expected Near Term HPC Processor Options
2016
Intel and ARM
- Cray has ARM in-house for testing
2017
- Intel, ARM, & IBM Power 9 (with closely coupled NVIDIA GPUs)
17
DoD Applications & Exascale Computing
• General external impression
– In the 2024 timeframe, DoD will have no requirement for a balanced exascale supercomputer (untrue)
–DoD should not be a significant participant in exascale planning for the U.S. (untrue)
• Reality
–DoD has compelling coupled multi-physics problems which will require more tightly-integrated resources than technologically possible in the 2024 timeframe
–DoD has many other use cases which will benefit from the power efficiencies and novel technologies generated by the advent of exascale computing
18
HPCMP & 2024 DoD Killer Applications
• HPCMP Categorizes Users Base into 11 Computational Technology Areas (CTAs)
• Climate Weather Ocean (CWO) is one of 11 CTAs
• Dr. Burnett (CNMOC TD) is the DoD HPCMP CWO CTA leader
• Each CTA leader tasked in FY14 to project Killer Apps in their CTA
• Dr. Burnett’s CWO CTA analysis lead by Lockheed Martin
• Primary focus is on HYCOM but includes NAVGEM, and ESPC
• Expect follow-on FY15 funding
• Develop appropriate Kiviat diagrams (example to follow)
• NRL Stennis part of an ONR sponsored NOPP project starting FY14 to look at attached processors (i.e. GPGPUs and accelerators) for HYCOM+CICE+WW3
19
Relevant Technology Issues
• Classical computing advances may stall in the next 10 years
– 22nm (feature size for latest processors)
– 14nm (anticipated feature size in 2015)
– 5-7nm (forecast limit for classical methods)
– Recent 3D approaches currently used and dense 3D approaches contemplated, but have limitations
• Mean-time-between-failures (MTBF) will decrease dramatically
– Petascale (hours to days)
– Exascale (minutes)
• Data management exacale hurdles
• Power management exascale hurdles
20
Relevant Software Issues
• Gap between intuitive coding (i.e. readily relatable to domain science) and high performance coding will increase
• Underpinnings of architectures will change more rapidly than codes can be refactored
• Parallelism of underlying mathematics will become asymptotic (at some point) despite the need to scale to millions [if not billions] of processing cores
• Current parallel code is based (in general) on synchronous communications; however, asynchronous methods may be necessary to overcome technology issues
21
Path Forward (Deliverables) [cont.]
• Kiviat diagram conveying system architecture requirements for each impactful advent
PetaFLOPs
Job duration (weeks)
Memory capacity (petabytes)
Memory BW (petabytes/s)
Interconnect BW (petabits/s)
1/(interconnect latency) (1/microseconds)
Disk capacity (petabytes)
I/O bandwidth (terabytes/s)
0
1
10,000
Future Computational Requirements for Hypersonic Flight Simulation
Spirit: 1.5PF Reference System
X-51: 1 Minute Flight Sim
SR-72: 1 Minute Flight Sim
Exascale Reference
22
March Toward Exascale Computing
• Dept of Energy target for exascale in 2024
• Japan target for exascale in 2020 (with $1B gov assistance)
• China target for exascale now in 2020 (originally in 2018)
• HPCMP’s systems expected in 7 or 8 years – 100 petaflops
23