Download - Xtw01t7v021711 cluster
© 2006 IBM Corporation
This presentation is intended for the education of IBM and Business Partner sales personnel. It should not be distributed to customers.IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM
Corporation
Introduction to Intelligent Clusters
XTW01Topic 7
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 2
Course Overview
The objectives of this course of study are:
>Describe a high-performance computing cluster
>List the business goals that Intelligent Clusters addresses
> Identify three core Intelligent Clusters components
>List the high-speed networking options available in Intelligent Clusters
>List three software tools used in Clusters
>Describe Cluster benchmarking
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 3
Topic Agenda
>*Commodity Clusters*
>Overview of Intelligent Clusters
>Cluster Hardware
>Cluster Networking
>Cluster Management, Software Stack, and Benchmarking
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 4
>Clusters are comprised of standard, commodity components that could be used separately in other types of computing configurations Compute servers – a.k.a. nodes High-speed networking adapters and switches Local and/or external storage A commodity operating system such as Linux Systems management software Middleware libraries and Application software
>Clusters enable “Commodity-based supercomputing”
What is a Commodity Cluster?
A multi-server system, comprised of interconnected computers and associated networking and storage devices, that are unified via systems management and networking software to accomplish a specific purpose.
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 5
StorageRack
FiberNetwork
Fibre SAN Switch
Storage Nodes
Ethernet Switch
Management,Storage,SOL andCluster VLANs
Storage VLAN
Management, SOL, ClusterVLANs
Management Node
User/Login Nodes
LAN
Cluster VLAN
High-speed network Switch
Message-passing Network
User access
To management network
Compute Node Rack
Public
VLAN
Conceptual View of a Cluster
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 6
Energy Finance Mfg Life Sciences Media Public / Gov’t
Seismic Analysis
Reservoir Analysis
Derivative Analysis
Actuarial Analysis
Asset Liability Management
Portfolio Risk Analysis
StatisticalAnalysis
Mechanical/ Electric Design
Process Simulation
Finite Element Analysis
Failure Analysis
Drug Discovery
Protein Folding
MedicalImaging
Digital Rendering
Collaborative Research
Numerical Weather
Forecasting
High Energy Physics
Bandwidth Consumption
Gaming
Application of Clusters in Industry
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 7
Technology Innovation in HPC
>Multi-core enabled systems create new opportunities to advance applications and solutions Dual and Quad core along with increased density memory designs “8 way” x86 128GB capable system that begins at less than $10k.
>Virtualization is a hot topic for architectures Possible workload consolidation for cost savings Power consumption reduced by optimizing system level utilization
>Manageability is key to addressing complexity Effective power/thermal management through SW tools Virtualization management tools must be integrated into the overall
management scheme
Multi Core
Virtualization
Manageability
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 8
Topic Agenda
>Commodity Clusters
>*Overview of Intelligent Clusters*
>Cluster Hardware
>Cluster Networking
>Cluster Management, Software Stack and Benchmarking
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 9
Approaches to Clustering
Roll Your Own Roll Your Own • Client orders individual
components from a variety of vendors, including IBM
• Client tests and integrates components or contracts with an integrator
• Client must address warranty issues with each vendor
BP Integrated BP Integrated • BP orders servers &
storage from IBM and networking from 3rd Party vendors
• BP builds and integrates components and delivers to customer
• Client must address warranty issues with each vendor
IBM Racked and IBM Racked and Stacked Stacked
• Client orders servers & storage in standard rack configurations from IBM
• Client integrates IBM racks with 3rd Party components or contracts with IGS or other integrator
• Client must address warranty issues with each vendor
Intelligent Intelligent ClustersClusters
• Client orders integrated cluster solution from IBM, including servers, storage and networking components
• IBM delivers factory-built and tested cluster ready to “plug-in”
• Client has Single Point of Contact for all warranty issues.
Piece PartsPiece Parts Integrated SolutionIntegrated SolutionClient bears all risk for
sizing, design, integration, deployment and warranty
issues
Single Vendor responsible for sizing, design,
integration, deployment, and all warranty issues
IBM Delivers Across the Spectrum
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 10
Blade Servers
Disk Storage
Storage Networking Fiber Channel iSCSI FcOE
Rack-mount Servers
Compute Nodes
StorageSoftware ServeRAID
IBM TotalStorage®
IBM Servers
CoreTechnologies
An IBM portfolio of components that have been cluster configured, tested, and work with a defined supporting software stack.
•Factory assembled •Onsite Installation •One phone number for support. •Selection of options to customize your configuration including Linux operating system (RHEL or SUSE), xCAT, & GPFS
The degree to which a multi-server system exhibits these characteristics determines if it is a cluster:- Dedicated private VLAN- All nodes running same suite of apps- Single point-of-control for:- Software/application distribution- Hardware management- Inter-node communication- Node interdependence- Linux operating system
Storage Nodes
Management Nodes
Processors -Intel®
FiberSASiSCSI
HS21-XM
Ethernet10 GbE
1 GbE
InfiniBand
4X – SDR 4X – DDR 4X - QDR
Networks
What is an IBM System Intelligent Cluster?
Out of band Management
Terminal Serv.
x3550 M3 x3650 M3
HS22
Scale-out Servers
iDataPlex
dx360 M3
HX5
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 11
IBM HPC Cluster Solution (Intelligent Clusters)
HPC Cluster SolutionSystem x Servers (Rack mount, Blades or iDataPlex)
GPFS
xCAT
Linux or Windows
Switches & Storage
Cluster Software
IBM or Business Partner adds…
+ =+Technical
Application(or “Workload”)
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 12
Course Agenda
>Commodity Clusters
>Overview of Intelligent Clusters
>*Cluster Hardware*
>Cluster Networking
>Cluster Management and Software Stack and Benchmarking
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 13
Intelligent Clusters Overview - Servers
IBM System x™ 3550 M3
High performance compute nodes
• Dual Socket – 3550 M3 Intel• Integrated System Management
IBM System x™ 3650 M3Mission critical performance
• Dual Socket – 3650 M3 Intel• Integrated System Management
2U
1U
Active Energy ManagerTM: Power Management at Your Control
• HS21-XM/HS22/HX5 Intel Processor-based Blades
IBM BladeCenter® with HS21-XM, HS22, and HX5
IBM BladeCenter S
Distributed, small office, easy to configure
IBM BladeCenter H
High performance
IBM BladeCenter E
Best energy efficiency, best density
HS21 XMExtended-memory
HS22General-purpose
enterprise
Industry-leading performance, reliability and control
HX5Scalable enterprise
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 14
IBM System x iDataPlex
PDUs
3U Chassis
2U Chassis
Switches
iDataPlex Rear Door Heat Exchanger
HPC ServerWeb Server
Storage Drives & Options
I/O TrayStorage Tray
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
Current iDataPlex Server Offerings
>Processor: Quad Core Intel Xeon 5500
>Quick Path Interconnect up to 6.4 GT/s
>Memory:16 DIMM DDR3 - 128 GB max
>Memory Speed: up to 1333 MHz
>PCIe: x16 electrical/ x16 mechanical
>Chipset: Tylersburg-36D
>Last Order Date: December 31, 2010
iDataPlex dx360 M2High-performance Dual-Socket
>Storage: 12 3.5” HDD up to 24 TB per node / 672TB per rack
>Proc: 6 or 4 Core Intel Xeon 5600
>Memory: 16 DIMM / 128 GB max
>Chipset: Westmere
iDataPlex 3U Storage RichFile Intense Dual-Socket
>Processor: 6 & 4 Core Intel Xeon 5600
>Quick Path Interconnect up to 6.4 GT/s
>Memory:16 DIMM DDR3 - 128 GB max
>Memory Speed: up to 1333 MHz
>PCIe: x16 electrical/ x16 mechanical
>Chipset: Westmere 12 MB cache
>Ship Support March 26, 2010
iDataPlex dx360 M3High-performance Dual-Socket
>Processor: 6 & 4 Core Intel Xeon 5600
>2 NVIDIA M1060 or M2050
>Quick Path Interconnect up to 6.4 GT/s
>Memory:16 DIMM DDR3 - 128 GB max
>Memory Speed: up to 1333 MHz
>PCIe: x16 electrical/ x16 mechanical
>Chipset: Westmere 12 MB cache
>Ship Support August 12, 2010
iDataPlex dx360 M3 RefreshExa-scale Hybrid CPU + GPU
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
System x iDataPlex dx360 M3
iDataPlex flexibility with better performance, efficiency and more options!
1U Drive Tray
1U Compute
Node
3U Storage Chassis
Maximize Storage Density
3U, 1 Node Slot & Triple Drive Tray
HDD: 12 (3.5” Drives) up to 24TB
I/O: PCIe for networking + PCIe for RAID
Compute + Storage
Balanced Storage and Processing
2U, 1 Node Slot & Drive Tray
HDD: up to 5 (3.5”)
Compute Intensive
Maximum Processing
2U, 2 Compute Nodes
750W N+N
Redundant
Power Supply
900W
Power
Supply
1U Dual GPU I/O Tray
550W
Power
Supply
Acceleration Compute + I/O
Maximum Component Flexibility
2U, 1 Node Slot
I/O: up to 2 PCIe, HDD up to 8 (2.5”)
Tailored for Your Business Needs
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
iDataPlex dx360 M3 Refresh
> Increased Server efficiency & Westmere enablement Intel Westmere-EP 4 and 6 core processor support (up to 95 watts) 2 DIMM / Channel @1333MHz with Westmere 95 watt CPU’s Lower Power (1.35V) DIMM (2GB, 4GB, 8GB)
>Expanded I/O performance capabilities New I/O tray and 3-slot “butterfly” PCIe riser to support 2 GPU + network adapter Support for NVIDIA Tesla M1060 or “Fermi” M2050 in a 2U Chassis + 4 HDD
>Expanded Power Supply Offerings Optional Redundant 2U Power Supply for Line Feed (AC) and Chassis (DC) protection High Efficiency power supplies fitted to workload power demands
>Storage Performance, Capacity and Flexibility Simple-Swap SAS, SATA & SSD, 2.5” & 3.5” in any 2U configuration Increased capacities of 2.5” & 3.5” SAS, SATA and SSD Increased capacities in 3U Storage Dense to 24TB (with 2TB 3.5” SATA/SAS drives) 6Gbps backplane for performance Rear PCIe slot enablement in 2U chassis for RAID controller flexibility Higher capacity/higher performance Solid State Drive controller
>Next-Generation Converged Networking FCoE via 10G Converged Network Adapters, Dual Port 10Gb Ethernet
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
dx360 M3 Refresh - Power Supply Offerings
>Maximum Efficiency for lower power requirements New High Efficiency 550W Power Supply for optimum efficiency in low power
configurations More efficiency by running higher on the power curve
>Flexibility to optimize power supply to workload appropriately 550W (non-redundant) for lower power demands 900W (non-redundant) for higher power demands 750W N+N for node and line feed redundancy
>Redundant Power Supply option for the iDataPlex chassis Node-level power protection for smaller clusters, head node, 3U storage-rich, VM &
Enterprise Rack-level line feed redundancy with discreet feeds Tailor rack-level solutions that require redundant power in some or all nodes Maintains maximum floor space density with the iDataPlex rack Graceful shutdown on power supply failure for virtualized environments
>Flexibility per chassis to optimize rack power Power supply is per 2U or 3U chassis Mix across the rack to maximize flexibility, minimize stranded power
900W HE
550W HE
750W N+N
AC 1
AC 2
PS 1 750W Max
PS 2 750W Max
C
A
B
750W Total in redundant mode
200-240V onlyRedundant supply block diagram
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
>Rack level value
Greater density, easier to cool
Flexibility of network topology without compromising density
More density reduces number of racks and power feeds in the data center
Rear Door Heat Exchanger provides the ultimate value in cooling and density
dx360 M3 Refresh - Rack GPU Configuration
>42 High Performance GPU servers / rack
> iDataPlex efficiency drives more density on the floor
> In-rack networking will not reduce rack density, regardless of topology required by the customer
>Rear Door Heat Exchanger provides further TCO value
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
4- 2.5” SS SAS 300 or 600GB/10K 6Gbps
(or SATA, or 3.5”, or SSD…)
Infiniband DDR
(or QDR, or 10GbE…)
NVIDIA M2050 #1
“Fermi”
(or M1060,or FX3800, or Fusion IO,…
NVIDIA M2050 #2
“Fermi”
Or M1060 FX3800, or Fusion IO,…)
Server level value
> Each server is individually serviceable
> Balanced performance for demanding GPU workloads
> 6Gbps SAS drives and controller for maximum performance
> Service and support for server and GPU from IBM
dx360 M3 Refresh - Server GPU Configuration
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 21
Intelligent Clusters Storage Portfolio Summary
> Intelligent Clusters BOM consists of the following Storage components Entry-level DS3000 series disk storage systems Mid-range DS4000 series disk storage systems High-end DS5000 series disk storage systems All standard hard disk drives (SAS/SATA/FC) Entry-level SAN fabric switches
>Majority of the HPC solutions use DS3000/DS4000 series disk storage with IBM GPFS parallel file system software
>A small percentage of HPC clusters use entry-level storage (DS3200/DS3300/DS3400/DS3500)
> Integrated business solutions (SAP-BWA, Smart Analytics, SoFS) use DS3500 storage (mostly)
>Smaller-size custom solutions use DS3000 entry-level storage
>A small percentage of special HPC bids use DDN DCS9550 storage
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 22
DS5020 (FC-SAN) DS5000 (FC-SAN)
DS3400 (FC-SAN, SAS/SATA)
DS3500 (SAS)
DS3300 (iSCSI/SAS)
EXP3000 Storage Expansion (JBOD)
Intelligent Clusters Storage Portfolio (Dec 2008)
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 23
Topic Agenda
>Commodity Clusters
>Overview of Intelligent Clusters
>Cluster Hardware
>*Cluster Networking*
>Cluster Management Software Stack and Benchmarking
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 24
Cluster Networking
>Networking is an integral part of any Cluster system from communication across various devices, including servers and storage, and for cluster management
>All servers in the cluster, including login, management, compute, and storage nodes communicate using one or more network fabrics connecting them
>Typically clusters have one or more of the following networks A cluster-wide Management network A user/campus network through which users login to the
cluster and launch jobs A low-latency, high-bandwidth network such as InfiniBand
used for inter-process communication A Storage network used for communication across the
storage nodes (optional) A Fibre-channel or Ethernet network (in case of iSCSI traffic)
used as the Storage network fabric
Cluster Network
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
QDR InfiniBand HCA’s
QDR InfiniBand Switches
4036
1U
36 ports
12200-36
1U
36 ports
InfiniScale IV
1U
36 ports
Director Class
InfiniScale IV
10 U
216 ports
Director Class
InfiniScale IV
29 U
648 ports
12800-180
14 U
432 ports
12800-360
29 U
864 ports
ConnectX-2
Dual Port
ConnectX-2
Single Port
QLE 7340
Single Port
12300-36
1U Managed
36 ports
Director Class
InfiniScale IV
6 U
108 ports
Director Class
InfiniScale IV
17 U
324 ports
Grid Director4700
18U
324 ports
Grid Director 4200
11U
110-160 ports
= New for 10B Release
InfiniBand Portfolio - Intelligent Cluster
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
SMC 8848M 1U
48 x 1Gb ports
2 x 10Gb uplink
SMC 8126L2 1U
26 1Gb ports
Cisco 4948 1U
48 x 1Gb ports
2x 10Gb optional uplink
Cisco 2960G-48
48 1Gb ports
1U
Low Cost 48 Port Industry Low Cost Premium Brand Alternative Premium Brand
(Stackable)
Premium Low Cost
Industry Low Cost
Blade G8000-48
48-1Gb ports
4x10Gb Up
1U
Cisco 4900 2U
Cisco 10gbE
24 10Gb Ports
IBM FCX-48 (Foxhound) 48X
48 1Gb ports
10Gb Up - I- DPX
Low Cost 48 Port
Added in Oct 10 BOM
SMC 8150L2 1U
50 1Gb ports
Industry Low Cost
Force 10 S60 1U
48 x 1Gb ports
Up to 4 x 10Gb opt. uplink
Blade G8124 1U
24x SFP+ 10Gb
24-port 10bGb SFP+
Cisco 3750G-48
48 1Gb ports with Stacking
1U
Intelligent Cluster Ethernet Portfolio
10G Switches
1G 48 Port with 10G Up
1G 48 Port Switches
1G 24 Port Switches
Entry / Leaf / Top of Rack Switches
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
Ethernet Switch Portfolio - iDataPlex
SMC 8848M 1U
48 x 1Gb ports
2 x 10Gb uplink
Industry Low Cost
Cisco 4948E 1U
48 x 1Gb ports
4 x 10Gb optional uplink
Premium Brand
Added in Oct 10 BOM
Blade G8124 1U
24x SFP+ 10Gb
24-port 10bGb SFP+
IBM B24X (TurboIron) 24X
24 10Gb ports
I- DPX
IBM DCN -24port 10Gb
IBM DCN -48port 10Gb
IBM B50C 1U
(NettIron 48)
48 1Gb ports w/2 10GbE (opt)
Low Cost 48 Port
IBM FCX-48 (Foxhound) 48X
48 1Gb ports
10Gb Up - I- DPX
Blade G8000-48
48-1Gb ports
4x10Gb Up
1U
Low Cost 48 Port
IBM J48 Juniper EX4200-48
48-1Gb ports
10Gb Up, 2 VC ports I- DPX
Premium Brand
Alternative Premium Brand
(Stackable)
Force 10 S60 1U 48 x 1Gb ports
4-10Gb Uplinks
10G Switches
1G 48 Port with 10G Uplinks
1G 24/48 Port Switches
Entry / Leaf / Top of Rack Switches
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
Ethernet Switch Portfolio - Intelligent Cluster
Core & Aggregate
Switches
.
Cisco 6509-E
15U 9 Slots
384-1Gb ports
32-10Gb ports
Chelsio Dual Port T3 SFP+ 10Gbe PCI-E x8 line rate adapter
Chelsio Dual port T3 CX4 10Gbe PCI-E x8 line rate adapter
Chelsio Dual port T3 10Gbe CFFh High Performance Daughter Card for Blades
Mellanox ConnectX 2 EN 10GbE PCI-E x8 line rate adapter
Added in Oct 10 BOM
10GbE HPC Adapters
IBM B16R
(BigIron)
16 Slots
768 -1Gb
256 -10Gb ports
IBM B08R(BigIron)
8 Slots 384 -
1Gb
32 -10Gb ports
Voltaire 8500
12 Slots
15U
288 -10Gb ports
All Core Switches & 10GbE Adapters Tested for compatibilitywith iDataPlex
Force 10 E600i
16U
7 slots
633-1Gb ports
112-10Gb ports
Force 10 E1200i
21U
14 slots
1260-1Gb ports 224-10Gb ports
Juniper 8216 21U
16 Slots
768-1Gb ports
128 -10Gb ports
Juniper 8208 14U
8 Slots
384-1Gb ports
64-10Gb ports
Core Switches & Adapters
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 29
High-speed Networking
>Many HPC applications are sensitive to network bandwidth and latency for performance
>Primary choices for high-speed networking for Clusters InfiniBand 10 Gigabit Ethernet (emerging)
> InfiniBand InfiniBand is an industry standard low-latency, high-bandwidth server interconnect, ideal
to carry multiple traffic types (clustering, communications, storage, management) over a single connection
>10Gigabit Ethernet 10GbE or 10GigE is an IEEE Ethernet standard 802.3ae, which defines Ethernet
technology with data rate of 10 Gbits/sec Follow-on to 1Gigabit Ethernet technology
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 30
InfiniBand
>An industry standard low-latency, high-bandwidth server interconnect
> Ideal to carry multiple traffic types (clustering, communications, storage, management) over a single physical connection
>Serial I/O interconnect architecture operating at a base speed of 5Gb/s in each direction with DDR and 10Gb/s in each direction with QDR
>Provides highest node-to-node bandwidth available today of 40Gb/s bidirectional with Quadruple Data Rate (QDR) technology
>Lowest end-to-end messaging latency in micro seconds (1.2-1.5 µsec)
>Wide-industry adoption and multiple vendors (Mellanox, Voltaire, QLogic, etc.)
>Open source drivers and libraries are available for users (OFED)
Lanes SDR - 2.5Gb/s DDR - 5Gb/s QDR - 10Gb/s EDR - 20Gb/s
1x (2.5 + 2.5) Gb/s (5 + 5) Gb/s (10 + 10) Gb/s (20 + 20) Gb/s
4x (10 + 10) Gb/s (20 + 20) Gb/s (40 + 40) Gb/s (80 + 80) Gb/s
8x (20 + 20) Gb/s (40 + 40) Gb/s (80 + 80) Gb/s (160 + 160) Gb/s
12x (30 + 30) Gb/s (60 + 60) Gb/s (120 + 120) Gb/s (240 + 240) Gb/s
InfiniBand Peak Bi-directional Bandwidth Table
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation
QDR InfiniBand HCA’s
QDR InfiniBand Switches
4036
1U
36 ports
12200-36
1U
36 ports
InfiniScale IV 1U
36 ports
Director Class
InfiniScale IV
10 U
216 ports
Director Class
InfiniScale IV
29 U
648 ports
12800-180
14 U
432 ports
12800-360
29 U
864 ports
ConnectX-2
Dual Port
ConnectX-2
Single Port
QLE 7340
Single Port
12300-36
1U Managed
36 ports
Director Class
InfiniScale IV
6 U
108 ports
Director Class
InfiniScale IV
17 U
324 ports
Grid Director4700
18U
324 ports
Grid Director 4200
11U
110-160 ports
New for 10B Release
InfiniBand Portfolio - Intelligent Cluster
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 32
10 Gigabit Ethernet
>10GbE or 10GigE is an IEEE Ethernet standard 802.3ae, which defines Ethernet technology with data rates of 10 Gbits/sec
>Enables applications to take advantage of 10Gbps Ethernet
>Requires no changes to the application code
>High-speed interconnect choice for “loosely-coupled” HPC applications
>Wide industry support for 10GbE technology
>Growing user adoption for Data Center Ethernet (DCE) and Fibre Channel Over Ethernet (FCoE) technologies
> Intelligent Clusters supports 10GbE technologies for both node-level and switch-level, providing multiple vendor choices for adapters and switches (BNT, SMC, Force10, Brocade, Cisco, Chelsio, etc.)
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 33
Topic Agenda
>Commodity Clusters
>Overview of Intelligent Clusters
>Cluster Hardware
>Cluster Networking
>*Cluster Management, Software Stack and Benchmarking*
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 34
Cluster Management - xCAT
>xCAT - Extreme Cluster (Cloud) Administration Toolkit Open Source Linux/AIX/Windows Scale-out Cluster Management Solution Leverage best practices for deploying and managing Clusters at scale Scripts only (no compiled code) Community requirements driven
>xCAT Capabilities Remote Hardware Control
- Power, Reset, Vitals, Inventory, Event Logs, SNMP alert processing
Remote Console Management- Serial Console, SOL, Logging / Video Console (no logging)
Remote OS Boot Target Control- Local/SAN Boot, Network Boot, iSCSI Boot
Remote Automated Unattended Network Installation For more information on xCAT go to http://xcat.sf.net
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 35
Cluster Software Stack
>Provides fast and reliable access to common set of file data from a single computer to hundreds of systems
>Brings together multiple systems to create a truly scalable cloud storage infrastructure
>GPFS-managed storage improves disk utilization and reduces footprint energy consumption and management efforts
>GPFS removes client-server and SAN file system access bottlenecks
>All applications and users share all disks with dynamic re-provisioning capability
SAN
GPFS
SAN
LANLAN
TECHNOLOGY: >OS Support
Linux (on POWER and x86) AIX Windows
> Interconnect Support (w/ TCP/IP) 1GbE and 10 GbE Infiniband (RDMA in addition to IPoIB) Myrinet IBM HPS
High performance scalable file management solution
IBM GPFS - General Parallel File System
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 36
What is GPFS ?
> IBM’s shared disk, parallel cluster file system.
>Product available on pSeries/xSeries clusters with AIX/Linux
>Used on many of the largest supercomputers in the world
>Cluster: 2400+ nodes, fast reliable communication, common admin domain.
>Shared disk: all data and metadata on disk accessible from any node through disk I/O interface.
>Parallel: data and metadata flows from all of the nodes to all of the disks in parallel.
GPFS File System Nodes
Switching fabric(System or storage area network)
Shared disks(SAN-attached or network
block device)
For more information on IBM GPFS, go to http://www-03.ibm.com/systems/clusters/software/gpfs/index.html
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 37
>Resource Managers/Schedulers queue, validate, manage, load balance, and launch user programs/jobs.
>Torque - Portable Batch System (free) Works with Maui Scheduler (free)
>LSF--Load Sharing Facility (commercial)
>Sun Grid Engine (free)
>Condor (free)
>MOAB Cluster Suite (commercial)
>Load Leveler (commercial scheduler from IBM)
Resource Managers/Schedulers
Job SchedulerJob Scheduler
User 1User 1
Resource ManagerResource Manager
Node 3Node 3Node 2Node 2Node 1Node 1 Node NNode N
Job Queue
User 2User 2 User NUser N
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 38
Messaging Passing Libraries
>Enable inter-process communication among processes of an application running across multiple nodes in the cluster (or on a symmetric multi-processing system)
>“Masks” the underlying interconnect from the user application Allows application programmer to use a “virtual” communication environment as reference
for programming applications for clusters
Messaging Passing Interface (MPI) Parallel Virtual Machine (PVM)
> Included with most Linux distributions (open source)
IP (Ethernet)
GM (Myrinet)
>Linda (commercial)
IP (Ethernet)
>MPICH2 (free)
IP (Ethernet)
MX (Myrinet)
InfiniBand
>LAM-MPI (free)
IP (Ethernet)
>Scali (commercial)
IP (Ethernet)
MX (Myrinet)
InfiniBand
>OpenMPI (free)
IP (Ethernet)
InfiniBand
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 39
Compilers & Other tools
>Compilers are critical in creating an optimized binary code that takes advantage of the specific processor architectural features such that the application can exploit the full power of the system and runs most efficiently
>Respective processor vendors typically have the best compilers for their processors – e.g. Intel, AMD, IBM, SGI, Sun, etc.
>Compilers are important to produce the best code for HPC applications as individual node performance is a critical factor for the overall cluster performance
>Open source and commercial compilers are available such as the GNU GCC compiler suite (C/C++, Fortran 77/90) (Free), and PathScale (owned by QLogic) compilers
>Support libraries and debugger tools are also packaged and made available with the compilers, such as Math libraries (e.g. Intel Math Kernel Libraries, AMD Core Math Library) and debuggers such as gdb (GNU debugger) and TotalView debugger used for debugging parallel applications on clusters
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 40
HPC Software StackThe Intelligent Clusters supports a broad range of HPC software from industry leading suppliers. Software is available directly from IBM or the respective Solution providers.
Functional Area Software Product Source Comments
Cluster Systems Management xCAT2IBM Director
IBM CSM functionality now merged into xCAT2
File Systems General Parallel File System (GPFS) for Linux;
GPFS for Linux on POWER
IBM
PolyServe Matrix Server File System
HP
NFS Open Source Lustre Open Source
Workload Management Open PBS Open Source PBS Pro AltaireLoadLeveler IBM
LSF Platform Computing MOAB Cluster Resources Commercial version of Maui
schedulerGridserver Datasynapse
Maui Scheduler open source Interfaces to many schedulersMessage Passing Interface Solutions
Scali MPI Connect™ Scali
Compilers PGI Fortran 77/90; C/C++ STM Portland Group 32/64-bit support Intel Fortran/C/C++ Intel NAG Fortran/C/C++ NAG 32/64-bit Absoft® Compilers Absoft PathScale™ Compilers PathScale AMD Opteron
GCC open source
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 41
Debugger/Tracer TotalView Etnus CodeAnalyst AMD Timer/event profiling pipeline simulations
Fx2 Debugger™ AbsoftDistributed Debugging Tool (DDT) Allinea
Math Libraries ACML (AMD Core Math Libraries) AMD/NAG BLAS, FFT, LAPACKIntel Integrated Performance Primitives
Intel
Intel Math Kernel Library IntelIntel Cluster Math Kernel Library IntelIMSL™, PV-WAVE® Visual Numerics
Message Passing Libraries
MPICH Open Source TCP/IP networks
MPIC-GM Myricom Myrinet networks SCA TCP Linda™ SCA
WMPI II™ Critical SoftwareParallelization Tools TCP Linda® SCAInterconnect ManagementScali MPI Connect ScaliPerformance Tuning Intel VTune™ Performance Analyzer Intel
Optimization and Profiling Tool (OPT)
Allinea
High Performance Computing Toolkit IBM http://www.research.ibm.com/actcThreading Tool Intel Thread Checker IntelTrace Tool Intel Trace Analyzer and Collector Intel
Functional Area Software Product Source Comments
HPC Software Stack Cont.
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 42
Cluster Benchmarking
Benchmarking – technique for running some well-known reference applications on clusters in order to exercise various system components and measuring the performance characteristics of the cluster (e.g. network bandwidth, latency, FLOPs, etc.)
>STREAM (memory access latency and bandwidth) http://www.cs.virginia.edu/stream/ref.html
>Linpack - the TOP500 benchmark Solves a dense system of linear equations You are allowed to tune the problem size and benchmark to optimize for your system http://www.netlib.org/benchmark/hpl/index.html
>HPC Challenge A set of HPC benchmarks to test various subsystems of a cluster system http://icl.cs.utk.edu/hpcc/
>SPEC A set of commercial benchmarks to measure performance of various subsystems of the servers http://www.spec.org/
>NAS 2.3 Parallel Benchmarks
>http://www.nas.nasa.gov/Resources/Software/npb.html
> Intel MPI Benchmarks (previously Pallas benchmarks) http://software.intel.com/en-us/articles/intel-mpi-benchmarks/
>Ping-Pong (Common MPI benchmark to measure point-to-point latency and bandwidth
>Customer's own code Provides a good representation of the system performance specific to the application code
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 43
Summary> A Cluster system is created out of commodity server hardware, high-speed networking,
storage and software technologies
> High-performance computing (HPC) takes advantage of cluster systems to solve complex problems in various industries
> IBM Intelligent Clusters provides a one-stop-shop for creating and deploying HPC solutions using IBM servers and third party Networking, Storage and Software
> InfiniBand, Myrinet (MX and Myri-10G), and 10Gigabit Ethernet technologies are more commonly used as the high-speed interconnect solution for Clusters
> IBM GPFS parallel file system provides a highly-scalable, and robust parallel file system and storage virtualization solution for Clusters and other general-purpose computing systems
> xCAT is an open-source, scalable cluster deployment and Cloud hardware management solution
> Cluster benchmarking enables performance analysis, debugging and tuning capabilities for extracting optimal performance from Clusters by isolating and fixing critical bottlenecks
> Message-passing middleware enables developing HPC applications for Clusters
> Several commercial software tools are available for Cluster computing
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 44
Glossary of Terms
>Commodity Cluster
> InfiniBand
>Message Passing Interface (MPI)
>Extreme Cluster (Cloud) Administration Toolkit (xCAT)
>Network-attached storage (NAS)
>Cluster VLAN
>Message-Passing Libraries
>Management Node
>High Performance Computing (HPC)
>Roll Your Own (RYO)
>BP Integrated
>Distributed Network Topology
> Intelligent Clusters
>General Parallel File System (GPFS)
>Direct-attached storage (DAS).
> iDataPlex
> Inter-node communication
>Compute Network
>Centralized Network Topology
> IBM Racked and Stacked
>Leaf Switch
>Core/aggregate Switch
>Quadruple Data Rate
>Storage Area Network (SAN)
>Parallel Virtual Machine (PVM)
>Benchmarking
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 45
Additional Resources
>IBM STG SMART Zone for more education:
Internal: http://lt.be.ibm.com
BP: http://lt2.portsmouth.uk.ibm.com/
>IBM System x
http://www-03.ibm.com/systems/x/
>IBM ServerProven
http://www-03.ibm.com/servers/eserver/serverproven/compat/us/
>IBM System x Support
http://www-947.ibm.com/support/entry/portal/
>IBM System Intelligent Clusters
http://www-03.ibm.com/systems/x/hardware/cluster/index.html
IBM Systems & Technology Group Education & Sales Enablement © 2011 IBM Corporation 46
Trademarks•The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.>Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not actively marketed or is not significant within its relevant market.
>Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml:
•The following are trademarks or registered trademarks of other companies.>Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
>Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefore.
>Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
>Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
>Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
>UNIX is a registered trademark of The Open Group in the United States and other countries.
>Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
>ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
>IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.
•All other products may be trademarks or registered trademarks of their respective companies
>Notes:
>Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
>IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
>All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.
>This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.
>All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
>Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products