maximize clusters' performance, the next level of clustering · 2015. 6. 10. · ethernet...

32
Maximize Clusters’ Performance The Next Level of Clustering The Next Level of Clustering Michael Kagan VP Architecture Mellanox Technologies

Upload: others

Post on 21-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

Maximize Clusters’ PerformanceThe Next Level of ClusteringThe Next Level of ClusteringMichael KaganVP ArchitectureMellanox Technologies

Page 2: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

2

Agenda

The visionThe interconnect challengeIndustry solutionProducts and roadmap

Page 3: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

3

The Vision – Clustered Grid

Page 4: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

4

Deliver Services to ClientsFrom equipment warehouse to service provider

ResourceResourcepoolpool

0%10%20%30%40%50%60%70%80%90%

100%

03Q1 03Q2 03Q3 03Q4 04Q1 04Q2 04Q3 04Q4 05Q1 05Q2 05Q3 05Q4 06Q1 06Q2

Cluster Market Penetration ClustersNon-Clustered

Clustering – the grid backbone technology

Page 5: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

5

Dual-socket server platform trends

1

10

100

1000

2000: 1Ghz, 1 core Today: 2Ghz, 4 cores 2010: 4Ghz, 2 16 cores

Sin

gle-

node

com

pute

pow

er

1

10

100

1000

Tota

l IO

ban

dwid

th (G

bit)

Single-node compute powerIO bandwidth

IO Performance Growth

More applications per More applications per server and I/Oserver and I/O

More traffic per I/O with More traffic per I/O with server I/O consolidationserver I/O consolidation

I/O capacity per server I/O capacity per server dictated by the most dictated by the most

demanding appsdemanding apps

10Gb/s+ 10Gb/s+ connectivity connectivity

for each for each serverserver

Multi Core CPUs

SAN Adoption

SharedResources

Multi-core CPUs mandating 10Gb/s+ connectivity

Page 6: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

6

Typical Deployments With GigE and FC

6-8 I/O adapters per server• Bandwidth• Functions

Not feasible Blade Servers• Limited PCI slots, backplane

tracesHigh CAPEX, OPEX

• More ports/serverManagement challenge

• Multiple networks• Multiple management domains

Physical Server 1

VirtualMachine 1

Virtual Switch (Network and SCSI/File Sys)

VirtualNIC

GigE NIC

NICDriver

External LAN/WAN and SAN Infrastructure

FC Driver

…VirtualHBA

VirtualMachine 2

VirtualNIC

VirtualHBA

ManagementApplication

& Console OSVirtualNIC

VirtualHBA

FC Driver

GigE NIC

NICDriver

FCHBA

FCHBA

GigE NIC

NICDriver

GigE NIC

NICDriver

Physical Server 2

Ethernet Switch

Ethernet Switch

Ethernet Switch

FC Switch

FC Switch

Ethernet SwitchBlade switches

in chassisFC Switch

Physical Server 1

VirtualMachine 1

Virtual Switch (Network and SCSI/File Sys)

VirtualNIC

GigE NIC

NICDriver

External LAN/WAN and SAN Infrastructure

FC Driver

…VirtualHBA

VirtualMachine 2

VirtualNIC

VirtualHBA

ManagementApplication

& Console OSVirtualNIC

VirtualHBA

FC Driver

GigE NIC

NICDriver

FCHBA

FCHBA

GigE NIC

NICDriver

GigE NIC

NICDriver

Physical Server 2

Ethernet Switch

Ethernet Switch

Ethernet Switch

FC Switch

FC Switch

Ethernet SwitchBlade switches

in chassisFC Switch

Software-based IOVTraditional approach does not scale

Page 7: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

7

I/O Overloaded?

IO DeliveryIO Deliverychallengeschallenges

High bandwidth, low latencyScalabilityLow power consumptionVirtualization, dynamic load balancingAgility, quicker business results

Server and storage I/O takes on a new role

Page 8: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

8

YesYesYesYesYes

EasyYesYesYesGood

InfiniBand

HardNoNoNoPoor

TCP

NoNoNoNoNo

Network convergence (virtual networks)Quality of ServiceLossless networkCongestion managementOptimal routing

Data(layer 2)

Scalability to 100GbitVirtual interface supportPoint to multipoint communicationQoS-aware interface to clientHigh Availability features

Transport(layer 4)

EthernetData Center requirementNetwork Layer

IO Infrastructure – Requirements

Source: Intel/CISCO, August 2005

Legacy interconnect does not fit Grid requirements

Page 9: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

9

InfiniBand – The Grid Interconnect

Top Performance at lowest price• Defined for low-cost implementation• Up to 120Gbit port speed

Scalable• Tens-of-thousands of nodes• Multi-core servers

Low CPU overhead• RDMA and Transport Offload

Service Oriented I/O• Quality-of-Service• Virtualization• I/O consolidation

Established software ecosystem

Switch-to

-Switch

Node-to-Node

Fibre Channel10GigE

2004 2005 2006 2007 2008

3040

1201101009080706050

1020

Gigabits per secondPerformance Roadmap

1GigE

Industry standard for grid interconnect

Page 10: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

10

Delivering Service Oriented I/O

Congestion control @ sourceResource allocation

End-to-End QualityOf Service

I/O Consolidation

Optimal PathManagement

Multiple traffic typesUp to 40% power savings

Packet drop preventionScale @ wire speed

Dedicated VirtualMachine Services

Virtual machine partitioningNear native performance

Provisioned IO services on a single wire

Clustering

Communications

Storage

Management

Page 11: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

11

InfiniBand HCA

Channel I/O Virtualization on Server

Hardware-based I/O virtualization• Isolation & protection per VM• DMA remapping / virtual address translation• Resource provisioning• Hypervisor offload (switching, traffic steering)

Supports current and future servers• Programmable• IOTLB* for future I/O MMU chipsets• Intel VT-d IOV, PCI-SIG IOV

Better resource utilization• Frees up CPU through hypervisor offload• Enables significantly more VMs per CPU

Native OS performance for VMs• Eliminates VMM overheads

VirtualMachine 1

HCA Driver

DMA Remapping

Memory

VirtualMachine 1

HCA Driver

VirtualMachine 1

HCA Driver

Physical Server

Virtual Machine Monitor

Port

VMM OffloadFunctions

* I/O Translation Look-aside Buffer

IO channels

Channel IO– deliver IO services to consumer

Page 12: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

12

VMM

VMVM

VM

VMVM

AppApp App

VMVM

Virtualization engine

middleware

Cluster as a Pool of Resources

Each VM (“container”) represents available compute resource

Applications are run in any available container

Storage is disassociated from the platform

Applications become readily transportable

VMM

VMVM

Unconstrained service delivery

Page 13: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

13

Clusters – Deliver Service to Consumer

Server-centric view: servers interconnected by networks

Service-centric view: applications are provisioned from a pool of resources

VM

App

VM

VM VM

App App

App App App

Interconnect – service delivery enabler

Page 14: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

14

“Global architecture”

Location-agnostic solution – from microns to miles

950 MB/s

66 millisecond roundtrip delay

@OC192c

> 3500 Km each way

Washington DC

Los Angeles, CA

Source: Obsidian

App

VM

VM VM

App App

GW

App

VM

VMVM

AppApp

GW

Page 15: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

15

Grid interconnect building blocks available today

Mellanox Technologies

A global leader in semiconductor solutions for server, storage and embedded connectivity Leading provider of low-latency and high-bandwidth InfiniBand solutions• Up to 20Gb/s NIC, up to 60Gb/s Switch• 1.7M ports shipped (Dec 2006)• 2.25us latency production, ~1us latency in 2007• 3W power consumption per HCA port

Efficient and scalable I/O• 4500-node cluster in production• 10K+ nodes clusters being installed

Price-performance-power leaderConverges clustering, communications, storage solutions

Page 16: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

16

Product Roadmap

2000 2003 2004 2005 2006 20072001 2002

40Gb/s Total

160Gb/s Total

40Gb/s Total

480, 960Gb/s Total

40, 80Gb/s Total

20, 40Gb/s Total

1st Gen.

2nd Generation

3rd Generation

4th Generation

Adapter

Adapter

Adapter

Adapter + Switch

Switch

Switch

Gen2Adapter

Two 10/20/40 Gb/s IB or Two 1/10Gb/s Ethernet

Consistent execution on the roadmap

Page 17: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

17

Interconnect: A Competitive Advantage

Enterprise Data CentersEnterprise Data Centers

HighHigh--Performance ComputingPerformance Computing

EmbeddedEmbedded

End-Users

Clustered DatabaseCustomer Relationship ManagementeCommerce and RetailFinancialWeb Services

Biosciences and GeosciencesComputer Automated EngineeringDigital Content CreationElectronic Design AutomationGovernment and Defense

CommunicationsComputing and Storage AggregationIndustrialMedicalMilitary

InfiniBand and Ethernet

ServersAnd Blades

Embedded

Switches

Storage

Complete system solutions for all market segments

Page 18: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

18

Enterprise Data CentersEnterprise Data Centers

EmbeddedEmbedded

Leading Customers / Growing MarketsSoftware PartnersHardware OEMs

Embedded

Servers

Storage

COMPUTERS

Switches

End-Users

HighHigh--Performance ComputingPerformance Computing

Page 19: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

19

InfiniBand Software Support

Industry-wide development of standard SW stackSupported by all Linux distributionsGranted WQHL by Microsoft

Full IO solution on all major OS distributions

Page 20: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

20

HPC – the Early InfiniBand Adopters

Growth rate from June 06 to Nov 06

• InfiniBand: +105%• Myrinet: -10%• GigE: -16%

105% growth from June 2006173% growth from Nov 2005

Top500 Interconnect Trends

020406080

100120140160180200220240260

Num

ber o

f Clu

ster

s

Jun-05 Nov-05 Jun-06 Nov-06InfiniBand Myrinet GigE

82 80

215

InfiniBand – the only growing interconnect

Page 21: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

21

Top500 Interconnect Placement Nov 06

InfiniBand is the preferred high performance interconnect• Connecting the most powerful clusters• InfiniBand is the best price/performance connectivity for Petascale

clusters

Top500 Interconnect Placement

01020304050607080

1-100 101-200 201-300 301-400 401-500

Top500 Placement

Num

ber o

f Clu

ster

s

InfiniBand Myrinet GigE Source:

InfiniBand – the dominant HPC interconnect

Page 22: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

22

Top500 Multi-Core Clusters Percentage

0%5%

10%15%20%25%30%35%40%45%50%

2005 Nov 2006 June 2006 Nov 2007 June

Top500 StatisticsTop500 Interconnect Penetration

0

10

20

30

4050

60

70

80

90

Year 1

Year 2

Year 3

Year 4

Year 5

Year 6

Year 7

Year 8

Year of Introduction

# of

Clu

ster

s

Ethernet InfiniBand

Multi-core will dominate the list in 2007• Native multi-core

InfiniBand adoption is faster than Ethernet

• Ethernet year 1 – June 1996• InfiniBand year 1 – June 2003

The fastest growth in top500 interconnect history

Page 23: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

23

Mellanox Superior Application Performance

57% better than GigEWolfram gridMathematicaMathematical Modeling

470% better than GigEAutodeskDigital Media

120% better than GigEWombatFinancial

29% better than MyrinetCD-Adapco STAR CD

24% better than MyrinetExa PowerFLOW

300% better than GigEESI PAM-CRASH

300% better than GigEOracleData Base

145%-1400% better than GigE15% better than QlogicFluent

Fluid Dynamics

55% better than MyrinetSchlumberger EclipseOil and Gas

26% better than Qlogic85% better than Myrinet115% better than GigELSTC LS-DYNAAutomotive

Platforms may vary, details available in the following slides Mellanox InfiniBand wins on real application

Page 24: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

24

Personal Supercomputers (PSC)From top 500 to technical computing

Maximum Performance At Your Fingertips

Driving supercomputing to the masses• Maximum performance, Minimum cost• Easy to use, Turnkey cluster• Fits into “cubicle” environment• Standard power, quiet operation• Ability to scale efficiently

Cluster waterfall from compute room to desktop

Page 25: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

25

InfiniBand Storage Interconnect

Native InfiniBand storage servers• Optimal Performance, Power Savings and TCO• No gateway bottlenecks• Ultimate scalability• Service Oriented I/O for consolidation

Storage controller clustering and failover solutionUsed to connect to the storage disk arrays

NFS, CIFS,FTP, HTTP

InfiniBand Switch

InfiniBand as a storage controller clustering solution

InfiniBand as a storage shelf interconnect

InfiniBand Switch

InfiniBandCluster

InfiniBandStorage

Storage servers and backend clustering

Page 26: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

26

InfiniBand Storage Solutions

InfiniBand Backend Clustering and Failover

Native InfiniBand Block Storage SystemsNative InfiniBand

Clustered File System

Native InfiniBandClustered File Storage Software

Native InfiniBandVisualization Storage

System

Native InfiniBandBlock Storage Software

Native InfiniBandSolid State Storage

Multiple storage vendors deploy InfiniBand today

Page 27: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

27

IB Storage = Best Price/Performance

TPC-H –Industry Standard Benchmark• Actual database transaction profiling• Results must be qualified and approved

InfiniBand Storage delivers best price/performance at 1TB class8 4-way dual-core AMD Opteron Servers37 x PANTA Systems Storage Array Modules• 518 x 250GB 7200rpm SATA HDD• Total Storage : 129500 GB

Turnkey server/storage solution for data warehousing applications

InfiniBand solutions win TPC-H 3rd year in a row

Page 28: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

28

Cluster Evaluation Center

Essential platform for customer evaluation• Latest and greatest InfiniBand hardware and software• InfiniBand based storage

NFS over RDMA• AMD and Intel based platforms

Available clusters• Mellanox cluster center: Intel quad core, AMD dual core

rev E• Colfax International: AMD dual core rev F

Available for customers’ evaluation• Developments, testing and evaluation• Free of charge

InfiniBand clusters available for evaluation

Page 29: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

29

Summary

InfiniBand is a clustering interconnect of choice• The only growing interconnect on top500• Dominates the top part of the top500

InfiniBand supports major industry trend• Industry standard, supported by major OEMs and ISVs• Leading cost/performance solutions• Multi-core and multi-node scalability• Service-oriented IO delivery• Compute and storage services on a single wire• Transition from equipment warehouse to service provider• Clusters – the utility computing backbone

InfiniBand clusters available for customers’ evaluation• Free of charge• Multiple platforms

InfiniBand – the standard cluster interconnect

Page 30: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

30

Real-life Applications

ECLIPSE million cell model • HP DL145 2.6Ghz servers, single CPU • OS: SUSE 9

CFD – CD-Adapco Star CD

InfiniBand wins on real application

Schlumberger ECLIPSE

600700800900

1000110012001300140015001600

16 32 64

# of Servers

Ela

psed

Run

time

(Sec

onds

)

Myrinet InfiniBand

Infiniband is 55% more efficient !

Lower is better

Schlumberger ECLIPSE

600700800900

1000110012001300140015001600

16 32 64

# of Servers

Ela

psed

Run

time

(Sec

onds

)

Myrinet InfiniBand

Infiniband is 55% more efficient !

Lower is better Benchmarks STAR-CD V3240/V3260

Lower is better

A Class Benchmark8 node cluster

600650

700750800850

900950

1000

10501100

1

Exe

cutio

n Ti

me

(sec

)Mellanox Myrinet

Benchmarks STAR-CD V3240/V3260

Lower is better

A Class Benchmark8 node cluster

600650

700750800850

900950

1000

10501100

1

Exe

cutio

n Ti

me

(sec

)Mellanox Myrinet

Page 31: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

31

CFD – Fluent

Great scaling from small to large clusterMulti-core environment demands InfiniBand

F L UE NT 6 .3 B eta - F L 5L 3 c as e

0

500

1000

1500

2000

2500

0 20 40 60 80 100 120 140

Nu m b e r o f C o re s

Flue

nt P

erfo

man

ce R

atin

g

M e lla n o x In fin iB an d G ig a b it E th ern e t

F luent 6.3, FL5L3 case

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 20 40 60 80 100 120 140

C P U co res

Rat

ing

(per

form

ance

)

Q lo g ic M e llan o xHigher is better

FLUENT - Performace RatingFL5L3 case, 16 nodes cluster

300

400

500

600

700

800

900

1000

1100

Mellanox InfiniBand Gigabit Ethernet

Flue

nt P

erfo

man

ce R

atin

g

Single-core Xeon 3.4GHz Dual-core Woodcrest 3GHz

47%

132%

Highest performance, best scalability

Page 32: Maximize Clusters' Performance, The Next Level of Clustering · 2015. 6. 10. · Ethernet InfiniBand Multi-core will dominate the list in 2007 • Native multi-core InfiniBand adoption

32

PAM-CRASH

05000

10000150002000025000300003500040000

16 CPUs 32 CPUs 64 CPUs

Elap

sed

Tim

e (s

ec)

InfiniBand GigE

Crash – ESI PAM-CRASH

Bavarian Car-To-Car Model1.1 M elements, 145000 cycles

“Gigabit Ethernet becomes ineffective with cluster size growth while Mellanox InfiniBand allows continued scalable speed up”

Lower is better

Highest performance, best scaling