fuel your insight - tweakers isc16 pre-show... · 2016-06-20 · albert einstein institute (aei)*...

20
Fuel Your Insight Intel® Scalable System Framework @ ISC16 Charles Wuischpard Hugo Saleh Vice President, Data Center Group Director of Marketing General Manager, HPC Platform Group HPC Platform Group NDA Under embargo until Monday, June 20 th 2016 at 6:30pm CEST (9:30am PT)

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Fuel Your InsightIntel® Scalable System Framework @ ISC16

Charles Wuischpard Hugo SalehVice President, Data Center Group Director of MarketingGeneral Manager, HPC Platform Group HPC Platform Group

NDA—Under embargo until Monday, June 20th 2016 at 6:30pm CEST (9:30am PT)

Page 2: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Many workloads I One FrameworkModeling

& SimulationMachineLearning

High PerformanceData Analytics

Visualization

Intel Scalable System FrameworkIntel® Xeon PhiTM Processors

Intel® HPC OrchestratorIntel® Omni-Path Fabric

Intel® Xeon® Processors

Intel® Ethernet

Intel® Silicon Photonics Technology

Intel® Software Tools

Intel® Cluster Checker

Intel® Optane™ Technology

Intel® Solid State Drive Data Center

Intel® Solutions for Lustre*

Intel SSF system recommendations = today

Intel-Supported Software Defined Visualization

Fabric

Compute Memory/ Storage

Software

Page 3: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

now available...Intel® Xeon PhiTM Processor

Run Any Workload

Programmability

Power Efficient

Leadership performance…

1st Integrated

Fabric

No PCIe* Bottleneck

Large Memory Footprint

Scalability & Future-Ready

1st Bootable, Host CPU

for Highly-Parallel Workloads

1st Integrated

Memory

vs. G

PU Ac

cele

rato

r

Up to

5xPerformance1

Up to

8xPerformance/

Watt2

Up to

9xPerformance/

USD$3

Formerly codenamed ‘Knights Landing

…with all the benefits of a CPU

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations: 1-3 see page 4.

*Other names and brands may be claimed as the property of others.

Page 4: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Your Path to Deeper InsightLife Science

Finance

VisualizationUp to

5.0x1

Performance

Up to

2.7x3

Performance

Up to

5.2x2

Performance

Monte Carlo* DP

LAMMPS* Embree*

Performance versus NVIDIA* GPU Accelerator

EngineeringOil & gas

Applied scienceDefense

weather

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations: 1-3 see page 6.

*Other names and brands may be claimed as the property of others. SPECfp* represents the throughput of 19 common HPC applications ranging from Fluid Dynamics to Speech Recognition – see http://spec.org/cpu2006/CFP2006/.

#1 World record 1-Socket SPECfp*_rate2006 benchmark result4

Page 5: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Cores Ghz Memory Fabric Ddr4 Power2

Recommended Customer Pricing3

72901

Best Performance/Node

72 1.5 16GB 7.2 GT/s

Yes 384GB 2400 MHz

245W $6254

7250Best Performance/Watt

68 1.4 16GB 7.2 GT/s

Yes 384GB 2400 MHz

215W $4876

7230Best Memory Bandwidth/Core

64 1.3 16GB 7.2 GT/s

Yes 384GB 2400 MHz

215W $3710

7210Best Value

64 1.3 16GB 6.4 GT/s

Yes 384GB 2133 MHz

215W $2438

Integrated

Choose your optimization point

1Available beginning in September 2 Plus 15W for integrated fabric 3 Pricing shown is for parts without integrated fabric. Add additional $278 for integrated fabric versions of these parts. Integrated fabric parts available in October.

Page 6: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Widespread market adoption^>100,000

units sold or pending

Developer Systems

Starting < $5K

>30 OEM and

channel systems

Major Lab Adoptions*

LANLLBNL

SandiaNERSC

ArgonneTACC

CinecaCEAT2K

KyotoCAS > 30 ISV

optimized apps (runs all x86 code)

TOP500listings (expected)

^Source: Intel estimates

intel.com/xeonphi/partners

*Other names and brands may be claimed as the property of others.

Page 7: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

rapidly delivering value to end users

>80,000 nodes sold2

Specified in ~1/2 of user RFPs2

Sold by every major HPC OEMBought in every geography

Some of the largest system awards

Intel® Omni-Path

ArchitectureAlfred Wegener Institute (AWI)*Albert Einstein Institute (AEI)*Cineca*University of Tokyo PT2K*

Pittsburgh Supercomputing Center*Rutgers University*TACC*US DoE (CTS-1)*

Delivers up to 58% better applicationperformance per fabric dollar than EDR1

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations: see page 10.

*Other names and brands may be claimed as the property of others.

Page 8: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Addressing the HPC Software Stack Headache

Intel® HPC OrchestratorIntel-supported system software based on OpenHPC^

Pre-integrated, pre-tested, pre-validated

3 products: turnkey to highly configurable

1st product: Q4’16 via channel partners

Now in trials with major OEMs, integrators, software vendors, and select HPC research centers

^ OpenHPC is an open source HPC community. For more information please visit: http://www.openhpc.community/

Integration and maintenance are widespread challenges

www.intel.com/hpcorchestrator

Open source solution for HPC

Community led organization

World-wide participation

39 members since launch (including non-IA based vendors)

Over 11,000 new worldwide visitors since launch

Version 1.1 now available

NEW

*Other names and brands may be claimed as the property of others.

Page 9: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

A vibrant and innovative community

Dozens more expected with the transition of Intel® Cluster Ready program partners to Intel® Scalable System Framework program by end of year

Intel® Scalable System Framework Partners

*Other names and brands may be claimed as the property of others.

Page 10: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Enriching the Intel® Scalable System Framework

1

2

3

Intel® SSF CONFIGURATION BENEFITS

Software CompatibilityInteroperability

PerformanceTotal Cost of Ownership

PURCHASECONFIDENCE

Intel® SSF Ingredients1

Deliver the performance, TCO2 and interoperability made possible from a balanced design

Intel® SSF Architecture System CompatibilityHPC system design best-practices Preserves application compatibility

Design ValidationAssures smooth deployments and operation

minimum in 2016

Vendor-specific validation4

Intel® Omni-Path

Architecture

and /or

&Intel®

ETHERNETSERVER

ADAPTERS

or

3 REQUIREMENTS

1 Compute is performed using Intel® Xeon® processor E5-2600 v4 product family and/or intel® Xeon Phi™ processor. Message fabric uses Intel® Omni-Path Fabric or Intel Ethernet Server Adapter. 2Total cost of ownership. 3Compliance with an Intel® SSF Reference Design also satisfies. 4Most major OEMs will self-certify that Intel SSF configurations of their HPC systems meet these three requirements. Smaller providers will joint-certify their configurations with Intel.

HPC Buyer Guidance

Intel® SSF Configurations of OEM HPC systems:

Multiple benefits: Intel® SSF value coupled with OEM innovations while preserving software compatibility

Simplifies purchase decisions for small-to-medium sized systems

INTEL® SCALABLE SYSTEM FRAMEWORKLEARN MORE:

+

+ [Public link access June 20th]

Intel® SSF

Architecture Specification

Page 11: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Fabric

ComputeMemory/Storage

Software

Training Inference^

Faster and More Scalable

Most Widely Deployed

than GPU1 for Machine learning2

A Great Tag Team for MACHINE LEARNINGArtificial intelligence

Intel® Scalable System Framework^Also known as scoring or prediction

1. See pages 15-16 for more details

2. Source: Intel estimates

Page 12: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Proven scalability for deep learning

Training SCORING

Your path to deeper insight

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations: 1-4 see page 13.

1

Up toUp to

Up to Up to

23

4

Page 13: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

focused INVESTMENTS for MACHINE Learning

10010100000101010100001110100111100100101010100001011011111000010

EnhancedRoadmap

Tools & Libraries

Partner Programs

NEW products optimized to extend leadership

Deep engagements with open source community to optimize deep learning frameworks for CPU

Announcing MKL-DNN, open source MKL’s deep learning neural network layers

Training 100K developers on machine learning

Early Access program for top research academics

Page 14: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Look for us at isc16 KeynoteRajeeb Hazra

AI: The Next HPC Workload

Monday, 6/20, 6:00 p.m.Intel® Xeon Phi™

Processor Launch

Celebrate the launch on the exhibit floor directly after the

keynote

50+ Collaboration Hub Presentations

by customers, partners and Intel

Vendor ShowdownCharlie WuischpardMonday, 6/20, 1:16 p.m.

14 Technical Sessions

WINNER Hans Meuer Most Outstanding ISC research Paper

’Mitigating MPI Message

Matching Misery’

Marquee Demos(1) EPFL - ‘BRayns’ Human Brain

Neuron Visualization

(2) Kyoto Univ. - Deep Learning Accelerates Drug Discovery

Intel® Scalable System Framework

Social Media (follow @IntelHPC)

Intel Press Roundtable &Customer Panel

Tuesday, 6/21, 10:00 – 11:00 a.m.

Movenpick Hotel, Matterhorn 1

Page 15: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Legal Notices and DisclaimersYou may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary.Intel does not guarantee any costs or cost reduction.

Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configurationmay affect your actual performance.

Configurations: see each performance slide notes for configurations. For more information go to http://www.intel.com/performance/datacenter.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC and SPEC MPI* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information.

Intel processors of the same SKU may vary in frequency or power as a result of natural variability in the production process.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at Intel.com

Intel products may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Copyright © 2016 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, True Scale, Omni-Path, Lustre-based Solutions, Intel® HPC Orchestrator, Silicon Photonics, 3D XPoint Technology, Intel OptaneTechnology and others are trademarks of Intel Corporation in the U.S. and/or other countries. Other names and brands may be claimed as the property of others. Other names and brands may be claimed as the property of others.

Page 16: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*
Page 17: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

21

“NOW AVAILABLE…” ConfigurationsSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of

those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including

the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations:

1.Intel measured results as of June 2016. Up to 5x more timesteps per second, 8x higher performance per watt and 9x better performance per dollar claims based on LAMMPS*

Course-Grain Water Simulation using Stillinger- Weber* potential comparison of the following. BASELINE CONFIGURATION: Dual Socket Intel® Xeon® processor E5-2697 v4

(45 M Cache, 2.3 GHz, 18 Cores) with Intel® Hyper-Threading and Turbo Boost Technologies enabled, 128 GB DDR4-2400 MHz memory, Red Hat Enterprise Linux* 6.7

(Santiago), Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe* x16, Intel® Server Board S2600WT2R, BMC 1.33.9832, FRU/SDR Package 1.09, 1.0 TB

SATA drive WD1003FZEX-00MK2A0 System Disk + one NVIDIA Tesla* K80 GPUs, NVIDIA CUDA* 7.5.17 (Driver: 352.39), ECC enabled, persistence mode enabled. Number

of MPI tasks on host varied to give best performance. CUDA MPS* used where possible. Mean Benchmark System Power Consumption: 683W. Estimated list price including

host: $13,750 source http://www.colfax-intl.com/ND/Servers/CX1350s-XK6.aspx. NEW CONFIGURATION: One node Intel Xeon Phi processor 7250 (16 GB, 1.4 GHz, 68

Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago)

running Intel® Compiler 16.0.2, Intel® MPI 5.1.2.150, Optimization Flags: “-O2 -fp-model fast=2 -no-prec-div -qoverride-limits”, Intel® Omni-Path Host Fabric Interface Adapter

100 Series 1 Port PCIe x16, 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk. Mean Benchmark System Power Consumption: 378W. Estimated list price: $7300

source Intel Recommended Customer Pricing (RCP).

Page 18: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

22

Your Path to Deeper Insight ConfigurationsSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations:

1. Up to 5x more timesteps per second, 8x higher performance per watt and 9x better performance per dollar claims based on LAMMPS* Course-Grain Water Simulation using Stillinger-

Weber* potential comparison of the following. BASELINE CONFIGURATION: Dual Socket Intel® Xeon® processor E5-2697 v4 (45 M Cache, 2.3 GHz, 18 Cores) with Intel® Hyper-

Threading and Turbo Boost Technologies enabled, 128 GB DDR4-2400 MHz memory, Red Hat Enterprise Linux* 6.7 (Santiago), Intel® Omni-Path Host Fabric Interface Adapter 100

Series 1 Port PCIe* x16, Intel® Server Board S2600WT2R, BMC 1.33.9832, FRU/SDR Package 1.09, 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk + one NVIDIA Tesla*

K80 GPUs, NVIDIA CUDA* 7.5.17 (Driver: 352.39), ECC enabled, persistence mode enabled. Number of MPI tasks on host varied to give best performance. CUDA MPS* used where

possible. Mean Benchmark System Power Consumption: 683W. Estimated list price including host: $13,750 source http://www.colfax-intl.com/ND/Servers/CX1350s-XK6.aspx. NEW

CONFIGURATION: One node Intel Xeon Phi processor 7250 (16 GB, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode,

MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago) running Intel® Compiler 16.0.2, Intel® MPI 5.1.2.150, Optimization Flags: “-O2 -fp-model fast=2 -no-prec-div -

qoverride-limits”, Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16, 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk. Mean Benchmark System

Power Consumption: 378W. Estimated list price: $7300 source Intel Recommended Customer Pricing (RCP).

2. Up to 5.2x faster renderings at 7.6 times better performance per dollar claim based on frames per second (FPS) results with a 1024x1024 image workloads with Intel Embree 2.10.0 using one node Intel Xeon Phi processor 7250 (16 GB, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat* Enterprise Linux 6.7 scoring 32.5 FPS with an estimated list price of $7,300 compared to hosted NVIDIA Titan X* GPU scoring 6.28 FPS with an estimated list price of $13,750.

3. Up to 2.7x more options evaluated per second, 2.8 times better performance per watt and 5.1 times superior performance per dollar value claims based on Monte Carlo DP workload results comparing one node Intel Xeon Phi processor 7250 (16 GB, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, CentOS* 7.2, quadrant cluster mode, MCDRAM flat memory mode scoring 4.43M options/second, 355 average watts and estimated list price of $7,300 compared to Supermicro* SYS-1028GR-TR server using Intel® Xeon® processor E5-2699 v4 (55 MB Cache, 2.2 GHz, 22 Cores) with Intel Hyper-Threading and Turbo Boost Tech. enabled, 256 GB DDR4-2133 MHz, Red Hat Enterprise Linux* 7.1 (Maipo) plus NVIDIA Tesla K80* scoring 1.62M options/second, 358 ave. watts (host system was essentially idle) and estimated list price of $13,750.

4. World record claim based on a SPECfp*_rate2006 base score of 842 and peak score of 870 submitted to SPEC.org (considered estimated until published) compared to all other 1-chip results published at https://www.spec.org/cpu2006/results/rfp2006.html as of 14 June 2016. Configuration: Fujitsu PRIMERGY* CX1640 M1 using Intel® Xeon Phi™ processor 7250 (16 GB MCDRAM, 1.4 GHz, 68 Cores) with 192 GB memory, Red Hat Enterprise Linux* 7.2 (3.10.0-327.13.1.el7.mpsp_1.3.2.100.x86_64) running Intel Intel® C++ and Fortran Compilers 16.0.2.181.

Page 19: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

Rapidly Delivering Value… ConfigurationsSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of

those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including

the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations:

1. Configuration for performance testing: Intel® Xeon® Processor E5-2697A v4 dual socket servers. 64 GB DDR4 memory per node, 2133 MHz. RHEL 7.2. BIOS settings:

Snoop hold-off timer = 9, Early snoop disabled, Cluster on die disabled. Intel® Omni-Path Architecture (Intel® OPA) Intel Fabric Suite 10.0.1.0.50. Intel Corporation Device

24f0 – Series 100 HFI ASIC (B0 silicon). OPA Switch: Series 100 Edge Switch – 48 port (B0 silicon). IOU Non-posted prefetch disabled. EDR InfiniBand

MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0). Mellanox EDR ConnectX-4 Single Port Rev 3 MCX455A HCA. Mellanox SB7700 - 36 Port EDR InfiniBand switch.

IOU Non-posted prefetch enabled. Applications: NAMD:, NAMD V2.11, GROMACS version 5.0.4. LS-DYNA MPP R7.1.2 LAMMPS (Large-scale Atomic/Molecular

Massively Parallel Simulator) Feb 16, 2016 stable version release. Quantum Espresso version 5.3.0, WRF version 3.5.1 Spec MPI 2007:. SPEC MPI2007, Large suite,

https://www.spec.org/mpi/. All pricing data obtained from www.kernelsoftware.com May 4, 2016. All cluster configurations estimated via internal Intel configuration tool.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future

costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. For 16 Node configuration, Fabric hardware assumes one

edge switch, 16 network adapters and 16 cables.

2. Intel estimates

Page 20: Fuel Your Insight - Tweakers ISC16 pre-show... · 2016-06-20 · Albert Einstein Institute (AEI)* Cineca* University of Tokyo PT2K* Pittsburgh Supercomputing Center* Rutgers University*

24

Your Path to Deeper Insight ConfigurationsSoftware and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance/datacenter. Configurations:

1.Up to 2.3x faster training per system claim based on AlexNet* topology workload (batch size = 1024) using a large image database running 4-nodes Intel Xeon Phi processor 7250 (16 GB MCDRAM, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago), 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk, running Intel® Optimized DNN Framework, Intel® Optimized Caffe (internal development version) training 1.33 billion images in 10.5 hours compared to 1-node host with four NVIDIA “Maxwell” GPUs training 1.33 billion images in 25 hours (source: http://www.slideshare.net/NVIDIA/gtc-2016-opening-keynote slide 32).

2.Up to 38% better scaling efficiency at 32-nodes claim based on GoogLeNet deep learning image classification training topology using a large image database comparing one node Intel Xeon Phi processor 7250 (16 GB MCDRAM, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, DDR4 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat* Enterprise Linux 6.7, Intel® Optimized DNN Framework with 87% efficiency to unknown hosts running 32 each NVIDIA Tesla* K20 GPUs with a 62% efficiency (Source: http://arxiv.org/pdf/1511.00175v2.pdf showing FireCaffe* with 32 NVIDIA Tesla* K20s (Titan Supercomputer*) running GoogLeNet* at 20x speedup over Caffe* with 1 K20).

3.Up to 50x faster training on 128-node as compared to single-node based on AlexNet* topology workload (batch size = 1024) training time using a large image database running one node Intel Xeon Phi processor 7250 (16 GB MCDRAM, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago), 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk, running Intel® Optimized DNN Framework, training in 39.17 hours compared to 128-node identically configured with Intel® Omni-Path Host Fabric Interface Adapter 100 Series 1 Port PCIe x16 connectors training in 0.75 hours. Contact your Intel representative for more information on how to obtain the binary. For information on workload, see https://papers.nips.cc/paper/4824-Large image database-classification-with-deep-convolutional-neural-networks.pdf.

4.Up to 30x software optimization improvement claim based on CNN training workload running 2S Intel® Xeon® processor E5-2680 v3 running Berkeley Vision and Learning Center* (BVLC) Caffe + OpenBlas* library and then run tuned on the Intel® Optimized Caffe (internal development version) + Intel® Math Kernel Library (Intel® MKL).