achieving cluster reliability & cost effectiveness using...

44
Achieving cluster Achieving cluster reliability & cost reliability & cost effectiveness using effectiveness using IPMI IPMI Richard Libby Richard Libby Enterprise Products Group, Intel Corporation Enterprise Products Group, Intel Corporation Tuesday, May 13, 2003 Tuesday, May 13, 2003 [email protected] [email protected]

Upload: vuongmien

Post on 17-Mar-2018

228 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

Achieving cluster Achieving cluster reliability & cost reliability & cost effectiveness using effectiveness using IPMIIPMI

Richard LibbyRichard LibbyEnterprise Products Group, Intel CorporationEnterprise Products Group, Intel CorporationTuesday, May 13, 2003Tuesday, May 13, 2003

[email protected]@hpc.intel.com

Page 2: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

2

“It is quite possible by the middle of this decade clusters in their

myriad forms will be the dominant High-end computing architecture.”

Dr. Thomas Sterling

JPL/Caltech

http://clusters.top500.org

“It is quite possible by the middle It is quite possible by the middle of this decade clusters in their of this decade clusters in their

myriad forms will be the dominant myriad forms will be the dominant HighHigh--end computing architecture.end computing architecture.””

Dr. Thomas SterlingDr. Thomas Sterling

JPL/CaltechJPL/Caltech

http://clusters.top500.orghttp://clusters.top500.org

Page 3: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

3

OverviewOverviewHPC history & motivationHPC history & motivationAchieving cluster reliability & Achieving cluster reliability & cost effectiveness using IPMIcost effectiveness using IPMICall to ActionCall to Action

*Other names and brands may be claimed as the property of others.

Page 4: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

4

PerformancePerformance

Hardware

OSOS

MiddlewareMiddleware

ApplicationsApplications

Department Level ModelDepartment Level ModelAccessible locallyAccessible locally

Today’s Today’s HPCHPC

ArchitectureArchitecture

10x growth10x growth

HPC 20yr SnapshotHPC 20yr Snapshot

Data Center ModelData Center ModelAccessed by specialistsAccessed by specialists

TraditionalTraditionalHPCHPC

ArchitectureArchitectureM

onol

ithic

Mon

olith

ic

Expe

nsiv

eEx

pens

ive

Prop

rieta

ryPr

oprie

tary

Excl

usiv

eEx

clus

ive

Intel®processor based ASCI - Red

$5M/GFLOP$5M/GFLOPNo COTSNo COTS

$250K/GFLOP$250K/GFLOPSome COTSSome COTS

$4K/GFLOP$4K/GFLOPAll COTSAll COTS

*COTS*COTS--Commercial Off the ShelfCommercial Off the Shelf

COSTCOSTCOST

SCore IIIe:An Intel® processor based cluster iJapan.

*Other names and brands may be claimed as the property of others.

Page 5: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

As your cluster As your cluster scales out, you scales out, you need to achieving need to achieving cluster reliability & cluster reliability & cost effectivenesscost effectiveness

Page 6: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

6

Cluster ReliabilityCluster ReliabilityProblem:Problem:

–– Even tiny “mean time between failures” can be significant for huEven tiny “mean time between failures” can be significant for huge ge clusters.clusters.

(Example)(Example)–– Cornell*/Intel®/Microsoft* collaboration: the Radar reliability Cornell*/Intel®/Microsoft* collaboration: the Radar reliability study study –– 64, 464, 4--way IA32 Pentium® III Xeon™ Processor systems over 4 months: way IA32 Pentium® III Xeon™ Processor systems over 4 months:

87 hours lost out of 199,680 hours, i.e. 99.96% uptime87 hours lost out of 199,680 hours, i.e. 99.96% uptime–– In a 1000 node cluster, that means 1 node goes down every 48 houIn a 1000 node cluster, that means 1 node goes down every 48 hours, I.e. 1 rs, I.e. 1

really irritated user every other day.really irritated user every other day.Solutions:Solutions:

–– Hot swapHot swap–– Hardware redundancyHardware redundancy–– Monitor/RAS network to anticipate troubleMonitor/RAS network to anticipate trouble–– Use IPMI compliant hardwareUse IPMI compliant hardware–– Use NEBS compliant hardwareUse NEBS compliant hardware–– Write applications that are fault tolerantWrite applications that are fault tolerant

Example: Cornell Theory Center has documented seven 9'sExample: Cornell Theory Center has documented seven 9's

*Other names and brands may be claimed as the property of others.

Page 7: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

7

Potential FailuresPotential FailuresThe Parallel ApplicationThe Parallel Application

Tools andTools andApplicationsApplications Minimal Error Checking

The Operating system and management softwareThe Operating system and management softwarerunning on the compute nodes and the control noderunning on the compute nodes and the control nodeOSOS Device Driver Failure, Free Space

MiddlewareMiddleware Communication Libraries Communication Libraries Clustering Control PackageClustering Control PackageMinimal Error Check

InterInter--node communications hardware and softwarenode communications hardware and softwareInterconnectInterconnect Network Saturation, H/W Failure

Compute NodeCompute Node Processor base, physical format, H/W Processor base, physical format, H/W managementmanagement, etc., etc.Unexpected Hardware Failure

Heating, cooling, space constraints, etc.Heating, cooling, space constraints, etc.EnvironmentEnvironment Power Failure, Cooling Failure

As your HPC scales out, these problems get worse!As your HPC scales out, these problems get worse!

Page 8: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

8

Node FailureNode FailureNeed Node Monitoring CapabilityNeed Node Monitoring CapabilityNeed Node Control CapabilityNeed Node Control Capability

Goal: Alert User Goal: Alert User BeforeBefore Node FailsNode Fails

Solution: IPMI v1.5 Compliant Node HardwareSolution: IPMI v1.5 Compliant Node Hardware

Page 9: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

9

IPMI v1.5IPMI v1.5

IIntelligentntelligent

PPlatformlatform

MManagementanagement

IInterfacenterface

• Defines a common, abstracted, message-based interface to intelligent platform management h/w

• Defines common records for describing common platform management devices and their characteristics

• Supports OEM differentiation and value added features

• Promoters: Intel, HP, NEC & DellAdopters: Over 140 and growing*

• Defines a common, abstracted, message-based interface to intelligent platform management h/w

• Defines common records for describing common platform management devices and their characteristics

• Supports OEM differentiation and value added features

• Promoters: Intel, HP, NEC & DellAdopters: Over 140 and growing*

*Adopters in backup slides*Other names and brands may be claimed as the property of others.

Page 10: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

10

IPMI Hardware SupportIPMI Hardware SupportManaged System BoardManaged System Board

MonitoringMonitoring& Control& Control

SystemSystemInterfaceInterface

SEL,SEL,SDR,SDR,FRUFRUBMCBMC

ProgramProgramFRUFRU

PROMsPROMs

BMCBMCBMCNetworkNetworkInterfaceInterface

System BusSystem Bus

Inte

rnet

/ In

tran

et /

Priv

ate

Net

wor

k

5 volt Standby

Page 11: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

11

IPMI v1.5 FunctionalityIPMI v1.5 FunctionalityMonitoringMonitoring––Voltage, Temperature, Fans, etcVoltage, Temperature, Fans, etc

Platform InformationPlatform Information––SDR, FRU, SELSDR, FRU, SEL

Independent Watchdog TimerIndependent Watchdog Timer––Used in Conjunction With BIOSUsed in Conjunction With BIOS

Outbound AlertingOutbound Alerting––Platform Event Traps (LAN Connection)Platform Event Traps (LAN Connection)

Page 12: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

12

Dealing With the ProblemDealing With the ProblemNeed extensive node hardware control Need extensive node hardware control (remote power up, power down, reset)(remote power up, power down, reset)Need extensive node hardware Need extensive node hardware configuration control (IP address, configuration control (IP address, boot source, etc…)boot source, etc…)

Goal: OverGoal: Over--thethe--Wire Cluster Node (ReWire Cluster Node (Re--)Configuration)Configuration

Solution: Direct Platform ControlSolution: Direct Platform Control

Page 13: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

13

LAN

NIC #1

NIC #2

BMC

LANRMCPPort26Fh

HostProcessor

DDirect irect PPlatform latform CControl ontrol (DPC(DPC--IPMI over LAN)IPMI over LAN)

In-bandTraffic

In-bandTrafficManagement

TrafficSideband

Connection

Page 14: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

14

Direct Platform ControlDirect Platform ControlImplementationImplementation

ConnectionConnection–– Direct to BMC via the Network InterfaceDirect to BMC via the Network Interface–– RMCP packet format through UDP port 26Fh (NIC RMCP packet format through UDP port 26Fh (NIC

Filters)Filters)–– Password protected (can be customized)Password protected (can be customized)

BIOS Console RedirectionBIOS Console Redirection–– Text mode onlyText mode only–– Terminates when the operating system boots into Terminates when the operating system boots into

protected modeprotected mode

Full Control of Power and ResetFull Control of Power and ResetManagement Queries of Platform HealthManagement Queries of Platform Health

Page 15: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

15

Example: Simple Example: Simple ArchitectureArchitecture

MasterMaster

Network AccessNetwork Access

Compute NodesCompute Nodeswith IPMI 1.5with IPMI 1.5

Health MasterHealth Master

StorageStorage

Data BackupData Backup

HotHot--spare IPMI spare IPMI Compute NodesCompute Nodes

Page 16: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

16

IPMI Promoter, Contributor, and Adopter NewsIPMI IPMI Promoter, Contributor, and Adopter NewsPromoter, Contributor, and Adopter NewsIntel CorporationInterphase CorporationInterWorks Computer ProductsInventec CorporationIpex ITGJMC ProductsL-3 Communications Corp.Lynux Works, Inc.Macrolink, IncMagnetek, Inc.Micro-Star InternationalMirapoint, Inc.Mitsubishi Electric Corp.Info. Systems Engineering CenterNational Semiconductor Corp.NEC CorporationNematron CorporationNetwork Appliance, Inc.Network Engines, Inc.Network Storage Solutions, Inc.NOCpulse, Inc.Olivetti Computers WorldwideOpen Source AsiaPEP Modular ComputersPhoenix Technologies Ltd.Pigeon Point Systems, Pinnacle Data Systems, Inc.Praim, Inc.Qlogic CorporationRadisys CorporationReliance Computer Corporation

Intel CorporationIntel CorporationInterphaseInterphase CorporationCorporationInterWorksInterWorks Computer ProductsComputer ProductsInventecInventec CorporationCorporationIpexIpex ITGITGJMC ProductsJMC ProductsLL--3 Communications Corp.3 Communications Corp.LynuxLynux Works, Inc.Works, Inc.MacrolinkMacrolink, Inc, IncMagnetek, Inc.Magnetek, Inc.MicroMicro--Star InternationalStar InternationalMirapointMirapoint, Inc., Inc.Mitsubishi Electric Corp.Mitsubishi Electric Corp.Info. Systems Engineering CenterInfo. Systems Engineering CenterNational Semiconductor Corp.National Semiconductor Corp.NEC CorporationNEC CorporationNematronNematron CorporationCorporationNetwork Appliance, Inc.Network Appliance, Inc.Network Engines, Inc.Network Engines, Inc.Network Storage Solutions, Inc.Network Storage Solutions, Inc.NOCpulseNOCpulse, Inc., Inc.Olivetti Computers WorldwideOlivetti Computers WorldwideOpen Source AsiaOpen Source AsiaPEP Modular ComputersPEP Modular ComputersPhoenix Technologies Ltd.Phoenix Technologies Ltd.Pigeon Point Systems, Pigeon Point Systems, Pinnacle Data Systems, Inc.Pinnacle Data Systems, Inc.PraimPraim, Inc., Inc.QlogicQlogic CorporationCorporationRadisysRadisys CorporationCorporationReliance Computer CorporationReliance Computer Corporation

Acer Inc.Agilent Technologies GmbHAlberta MicroelectronicsAmerican Megatrends Inc.Arima Computer Corp.ASUSTek Computer, Inc.Axil Computer, Inc.Blue Wave SystemsBull S.A.CelesticaConcurrent Technologies, PLCCyberGuard CorporationData General CorporationDell Computer CorporationEgenera, Inc.ElanVital CorporationEricsson UABEvans & SutherlandEversys CorporationExabyte CorporationFORCE Computers GmbHFujitsu, Ltd.GoAhead Software, Inc.HADCO CorporationHCL Infosystems Ltd.Hewlett-Packard CompanyHewlett-Packard GmbHHitachi Ltd.Hybricon CorporationIbus/Phoenix CorporationInnoMediaLogic, Inc.

Acer Inc.Acer Inc.Agilent Technologies GmbHAgilent Technologies GmbHAlberta MicroelectronicsAlberta MicroelectronicsAmerican Megatrends Inc.American Megatrends Inc.ArimaArima Computer Corp.Computer Corp.ASUSTekASUSTek Computer, Inc.Computer, Inc.AxilAxil Computer, Inc.Computer, Inc.Blue Wave SystemsBlue Wave SystemsBull S.A.Bull S.A.CelesticaCelesticaConcurrent Technologies, PLCConcurrent Technologies, PLCCyberGuardCyberGuard CorporationCorporationData General CorporationData General CorporationDell Computer CorporationDell Computer CorporationEgeneraEgenera, Inc., Inc.ElanVitalElanVital CorporationCorporationEricsson UABEricsson UABEvans & SutherlandEvans & SutherlandEversysEversys CorporationCorporationExabyte CorporationExabyte CorporationFORCE Computers GmbHFORCE Computers GmbHFujitsu, Ltd.Fujitsu, Ltd.GoAheadGoAhead Software, Inc.Software, Inc.HADCO CorporationHADCO CorporationHCL HCL InfosystemsInfosystems Ltd.Ltd.HewlettHewlett--Packard CompanyPackard CompanyHewlettHewlett--Packard GmbHPackard GmbHHitachi Ltd.Hitachi Ltd.HybriconHybricon CorporationCorporationIbusIbus/Phoenix Corporation/Phoenix CorporationInnoMediaLogicInnoMediaLogic, Inc., Inc.

Samsung Electronics Co., Inc.Sanera Systems, Inc.SBS Technologies(Industrial Computers GmbH)Scenix Semiconductor, Inc.Siemens AGSilicon Graphics, Inc.Stratus Computer SystemsIreland Ltd.Summit Microelectronics, Inc.Sun MicrosystemsSuper Micro Computer, Inc.Symphony Group Intl. Co., Ltd.Synergy MicrosystemsTeknor Applicom, Inc.T-Netix, Inc.Tatung Co.TektronixTexas Micro CorporationToshiba CorporationTrilogic Systems, LLCTrimm TechnologiesTyan Computer CorporationUniversal Scientific Industrial Corp.USAR Systems, Inc.Vitesse Semiconductor Corp.Watrin System DesignVividon, Inc.Vooha, Inc.Winbond Electronics Corp.Ziatech Corporation

Samsung Electronics Co., Inc.Samsung Electronics Co., Inc.SaneraSanera Systems, Inc.Systems, Inc.SBS TechnologiesSBS Technologies(Industrial Computers GmbH)(Industrial Computers GmbH)ScenixScenix Semiconductor, Inc.Semiconductor, Inc.Siemens AGSiemens AGSilicon Graphics, Inc.Silicon Graphics, Inc.Stratus Computer SystemsStratus Computer SystemsIreland Ltd.Ireland Ltd.Summit Microelectronics, Inc.Summit Microelectronics, Inc.Sun MicrosystemsSun MicrosystemsSuper Micro Computer, Inc.Super Micro Computer, Inc.Symphony Group Intl. Co., Ltd.Symphony Group Intl. Co., Ltd.Synergy MicrosystemsSynergy MicrosystemsTeknorTeknor ApplicomApplicom, Inc., Inc.TT--NetixNetix, Inc., Inc.TatungTatung Co.Co.TektronixTektronixTexas Micro CorporationTexas Micro CorporationToshiba CorporationToshiba CorporationTrilogicTrilogic Systems, LLCSystems, LLCTrimmTrimm TechnologiesTechnologiesTyanTyan Computer CorporationComputer CorporationUniversal Scientific Industrial Corp.Universal Scientific Industrial Corp.USAR Systems, Inc.USAR Systems, Inc.VitesseVitesse Semiconductor Corp.Semiconductor Corp.WatrinWatrin System DesignSystem DesignVividonVividon, Inc., Inc.VoohaVooha, Inc., Inc.WinbondWinbond Electronics Corp.Electronics Corp.ZiatechZiatech CorporationCorporation

*Other names and brands may be claimed as the property of others.

Page 17: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

17

So what’s it worth to you?So what’s it worth to you?

Power ControlPower Control––88--Port NPS US$500 Port NPS US$500 –– 1k Nodes ~US$63K1k Nodes ~US$63K

Sensor ModelSensor Model–– 11--Port RS232 Sensor US$179 Port RS232 Sensor US$179 ––1k Nodes ~US$179K1k Nodes ~US$179K

Serial RedirectionSerial Redirection–– 4040--Port Serial Concentrator US$50K Port Serial Concentrator US$50K –– 1k Nodes ~US$1.2M1k Nodes ~US$1.2M

Spend more money on additional nodesSpend more money on additional nodesinstead of additional support infrastructureinstead of additional support infrastructure

Page 18: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

18

SummarySummaryHPC Clusters follow Moore’s law as HPC Clusters follow Moore’s law as upgrade pathupgrade pathUse IPMIUse IPMI--based systems for minimum based systems for minimum cluster downtimecluster downtime

*Other names and brands may be claimed as the property of others.

Page 19: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

19

Call to ActionCall to ActionTake advantage of affordable COTS Take advantage of affordable COTS hardware to drive largehardware to drive large--scale HPC scale HPC implementations that have minimal implementations that have minimal downtime using IPMI compliant hardwaredowntime using IPMI compliant hardware

ExcellentExcellent CoolCool FarFarOutOut

Page 20: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

20

Backup InformationBackup Information

Page 21: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

21

Links & Contact InfoLinks & Contact InfoLinks:Links:

–– http://http://www.intel.com/go/hpcwww.intel.com/go/hpc–– http://oss.sgi.com/projects/pcphttp://oss.sgi.com/projects/pcp–– http://www.openpbs.orghttp://www.openpbs.org–– http://http://www.openclustergroup.orgwww.openclustergroup.org–– http://clumon.ncsa.uiuc.eduhttp://clumon.ncsa.uiuc.edu–– http://http://sourceforge.net/projects/clumonsourceforge.net/projects/clumon

Richard LibbyRichard Libby–– 503503--712712--9794 [email protected] [email protected]

*Other names and brands may be claimed as the property of others.

Page 22: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

22

iSViSV Enabling Enabling -- .MSC, .MSC, AccelrysAccelrys, LSTC, Fluent, , LSTC, Fluent, -- Schlumberger, Western GECOSchlumberger, Western GECO-- United Devices, United Devices, EntropiaEntropia, Platform, Platform

Channel EnablingChannel Enabling--WW OEMSWW OEMS-- Regional LeadersRegional Leaders

Intel WW ServicesIntel WW Services-- Dedicated HPC PersonnelDedicated HPC Personnel-- End User Developer EnablingEnd User Developer Enabling-- Infrastructure (WA, NM, VG, OR)Infrastructure (WA, NM, VG, OR)-- Premier Technical SupportPremier Technical Support-- HW Design SupportHW Design Support

iHViHV EnablingEnabling--TyanTyan, Asus, , Asus, CelesticaCelestica

Industry StandardsIndustry Standards-- LCI, GGF, OSDN, LCI, GGF, OSDN, GelattoGelatto, , IPMIIPMI

Open Source SupportOpen Source Support-- OSDN, Source Forge, Linux BiosOSDN, Source Forge, Linux Bios

Intel BuildingIntel BuildingBlocksBlocks--CPUCPU--Chipset Chipset --InterconnectInterconnect--Platform/Server boardsPlatform/Server boards--SW SuiteSW Suite

Intel TrainingIntel Training--WW Software CollegeWW Software College--WW Face to Face TrainingWW Face to Face Training

Intel CapitalIntel Capital-- SCALI, United DevicesSCALI, United Devices

Enabling HPC with the Intel EcosystemEnabling HPC with the Intel Ecosystem

OSV EnablingOSV Enabling--Microsoft, Red Hat, Microsoft, Red Hat, SuSESuSE, HP/UX, HP/UX

Cluster EnablingCluster Enabling•• Scyld, Scali, OSCAR, Scyld, Scali, OSCAR, ROCKSROCKS

System IntegratorsSystem Integrators-- BluePrintingBluePrinting-- AccentureAccenture, PWC, CS, PWC, CS

iNViNV EnablingEnabling--MyracomMyracom, Dolphin, , Dolphin, QaudricsQaudrics--Mellanox, IBM, TopspinMellanox, IBM, Topspin

CollaborationsCollaborations-- CTC, CERN, NCSACTC, CERN, NCSA

ManufacturingManufacturing--.13us, 90nm, 300mm.13us, 90nm, 300mm-- Platform ValidationPlatform Validation

Solutions CentersSolutions Centers-- Oil, Finance, Life SciencesOil, Finance, Life Sciences

*Other names and brands may be claimed as the property of others.

Page 23: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

Large Scale HPC Large Scale HPC Management & Management & Monitoring using Monitoring using CLUMONCLUMON

Page 24: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

24

In the beginning…In the beginning…

Computing in the darkComputing in the dark

*Other names and brands may be claimed as the property of others.

Page 25: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

25

CLUMON ArchitectureCLUMON Architecture

Hosts withPCP

Hosts withHosts withPCPPCPHosts with

PCPHosts withHosts with

PCPPCPHosts withPCP

Hosts withHosts withPCPPCPNodes withPCP/IPMI

Nodes withNodes withPCP/IPMIPCP/IPMI

*Other names and brands may be claimed as the property of others.

PBS PBS SchedulerScheduler

ApacheApachewith PHPwith PHP

CLUMONCLUMONDaemonDaemon

MySQLDatabase Hosts

QueueJobs

Perf Metrics

Page 26: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

26

Current FeaturesCurrent FeaturesIntuitive interfaceIntuitive interface–– Point and click web based, others possiblePoint and click web based, others possible

Correlation of dataCorrelation of data–– CrossCross--reference info from different systemsreference info from different systems

Control of resource consumptionControl of resource consumption–– Cached data; tunable frequency and quantityCached data; tunable frequency and quantity

Huge set of data availableHuge set of data availablePortable dataPortable data–– Easily publish data in any formatEasily publish data in any format–– Connect to data from socket, web, mysqlConnect to data from socket, web, mysql

AdaptableAdaptable–– Plugins, pcp modules, generic db structurePlugins, pcp modules, generic db structure

Page 27: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

27

Upcoming FeaturesUpcoming FeaturesV1.3V1.3–– User and Admin portalsUser and Admin portals

–– Problem TrackingProblem Tracking

–– Availability MonitoringAvailability Monitoring

V1.4V1.4–– Myrinet counter PCP moduleMyrinet counter PCP module

–– Alert SystemAlert System

–– CluMon Plugin capabilityCluMon Plugin capability

–– Maui Reservation Monitoring PluginMaui Reservation Monitoring Plugin

–– Job Performance PluginJob Performance Plugin

FutureFuture–– Network Data & VisualizationNetwork Data & Visualization

–– Management via scriptsManagement via scripts

–– Support for LSF, Support for LSF, IPMIIPMI, CIM, etc., CIM, etc.

*Other names and brands may be claimed as the property of others.

Page 28: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

28

Main Window (small scale)Main Window (small scale)

Page 29: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

29

Main Window (large scale)Main Window (large scale)

Page 30: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

30

Core Tenets of Intel’s HPC StrategyCore Tenets of Intel’s HPC Strategy• Building block products based on industry standards

and high volume to bring improved economics to HPC.

• Collaborations with thought leadership accounts that drive the evolution of applications, tools, and computemodels forward.

• Programs enabling the broadest channels in the world delivering a significant choice of alternatives for the HPC community.

• Ecosystem support and enabling of ISVs, IHVs, and OSV.

Page 31: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

31More at www.intel.com/go/hpc/

Solutions and Case StudiesSolutions and Case StudiesSolutions for the Energy Vertical MarketSolutions for the Energy Vertical MarketSignificant more computing power than other solutions using IPFSignificant more computing power than other solutions using IPFShort video on how Intel Itanium architecture is enabling new scShort video on how Intel Itanium architecture is enabling new scientific ientific breakthroughs at the National Center for Supercomputing Applicatbreakthroughs at the National Center for Supercomputing Applications ions (NCSA), Cornell Theory Center (CTC), and the European Organizati(NCSA), Cornell Theory Center (CTC), and the European Organization on for Nuclear Research (CERN)for Nuclear Research (CERN)Success stories on some of the world's leading research centers Success stories on some of the world's leading research centers that that have already turned to Intelhave already turned to Intel--based solutions to meet their highbased solutions to meet their high--performance computing needs. Their stories provide a lesson in hperformance computing needs. Their stories provide a lesson in how ow to achieve maximum performance at minimal cost using Intel to achieve maximum performance at minimal cost using Intel processors.processors.Applications and resources that are written to fully take advantApplications and resources that are written to fully take advantage of age of HPC technologyHPC technologyHPC enabling software that will make your code run faster on IA HPC enabling software that will make your code run faster on IA (such (such as compilers, libraries and tools, solution stacks)as compilers, libraries and tools, solution stacks)Interconnect products and programs to speed up communication & Interconnect products and programs to speed up communication & bandwidth between network nodesbandwidth between network nodesProcessor informationProcessor information

Page 32: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

32

Cluster Software StackCluster Software Stack

Collaboration:Collaboration: Intel, IBM, ORNL, NCSA, Intel, IBM, ORNL, NCSA, Dell, SGI, Veridian, MSC.softwareDell, SGI, Veridian, MSC.software

Compared to Beowulf:Compared to Beowulf:–– Industry support from inceptionIndustry support from inception–– Draws from large communityDraws from large community–– LongLong--term effortterm effort–– Planned support for multiple OS’sPlanned support for multiple OS’s

*Other names and brands may be claimed as the property of others.

Page 33: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

33

System Control Software System Control Software –– OSCAR*OSCAR*

FeaturesFeatures–– Backed by a consortium:Backed by a consortium:

–– Dell, IBM, Intel, NCSA, ORNL, SGI, Dell, IBM, Intel, NCSA, ORNL, SGI, VeridianVeridian, , MSC.softwareMSC.software

–– ComponentsComponents–– LUI (Linux Utility for cluster Install uses PXE boot)LUI (Linux Utility for cluster Install uses PXE boot)–– C3 (Cluster node management)C3 (Cluster node management)–– MPI/PVM (Message passing interfaces)MPI/PVM (Message passing interfaces)–– PBS (Workload management)PBS (Workload management)–– OpenSSHOpenSSH (Security)(Security)

*Other names and brands may be claimed as the property of others.

Page 34: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

34

Scyld* Computing Scyld* Computing –– Beowulf Beowulf Cluster Operating SystemCluster Operating System

FeaturesFeatures–– Second Generation ProductSecond Generation Product–– Created By Don Becker (part of the Created By Don Becker (part of the

original Beowulf development team)original Beowulf development team)–– Proven RobustnessProven Robustness–– Ease of useEase of use–– GUI based cluster node configuration GUI based cluster node configuration

and controland control–– Commercial Package Commercial Package –– Fully Fully

supported Open Sourcesupported Open Source–– Now Supports IPMINow Supports IPMI

*Other names and brands may be claimed as the property of others.

Page 35: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

35

ComponentsComponents��Written in CWritten in C

�� clumond is the collection daemonclumond is the collection daemon��Uses Open source componentsUses Open source components

��PBS SchedulerPBS Scheduler��Performance CoPerformance Co--Pilot from SGIPilot from SGI��MySQL databaseMySQL database

��Display windows use PHP on ApacheDisplay windows use PHP on Apache��Renders HTML, XML or any other custom Renders HTML, XML or any other custom

format via a socket interfaceformat via a socket interface

*Other names and brands may be claimed as the property of others.

Page 36: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

36

PBS InformationPBS InformationClumondClumond gathers information aboutgathers information about–– QueuesQueues–– JobsJobs–– Hosts (as it pertains to scheduling)Hosts (as it pertains to scheduling)ClumondClumond gathers ALL information gathers ALL information from the Scheduler in a generic from the Scheduler in a generic fashion. So if there are changes in fashion. So if there are changes in PBS, no change should be needed in PBS, no change should be needed in clumond.clumond.

*Other names and brands may be claimed as the property of others.

Page 37: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

37

PCP InformationPCP Information�� PCPPCP�� Available from:Available from:

–– http://http://oss.sgi.com/projects/pcposs.sgi.com/projects/pcp//�� Provides info on:Provides info on:

��system loadsystem load��memory usagememory usage��process informationprocess information��network informationnetwork information

��Customizable and expandableCustomizable and expandable

*Other names and brands may be claimed as the property of others.

Page 38: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

38

PCP InformationPCP InformationPCP data can include almost any PCP data can include almost any machine metric desired. If not already machine metric desired. If not already provided for, PCP is expandable.provided for, PCP is expandable.Content is very flexible; Maintainable Content is very flexible; Maintainable by a list in the database. Just add a by a list in the database. Just add a metric to the list and it will be metric to the list and it will be gathered on the next data collection gathered on the next data collection phase.phase.No need to change web page either. No need to change web page either. Data will show up automatically.Data will show up automatically.

Page 39: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

39

Database ConfigurationDatabase ConfigurationDatabase schema is very abstract so Database schema is very abstract so that is can be adapted easily to future that is can be adapted easily to future schedulers and performance APIsschedulers and performance APIsAll the system’s configuration is done All the system’s configuration is done in the database. No need to restart the in the database. No need to restart the daemon for changes to take effect.daemon for changes to take effect.The database is very expandable. The database is very expandable. New areas of data collection include New areas of data collection include network adapters, switches, and job network adapters, switches, and job performance counters.performance counters.

Page 40: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

40

Links & Contact InfoLinks & Contact InfoLinks:Links:

–– http://http://www.intel.com/go/hpcwww.intel.com/go/hpc–– http://oss.sgi.com/projects/pcphttp://oss.sgi.com/projects/pcp–– http://www.openpbs.orghttp://www.openpbs.org–– http://http://www.openclustergroup.orgwww.openclustergroup.org–– http://clumon.ncsa.uiuc.eduhttp://clumon.ncsa.uiuc.edu–– http://http://sourceforge.net/projects/clumonsourceforge.net/projects/clumon

Richard LibbyRichard Libby–– 503503--712712--9794 [email protected] [email protected]

*Other names and brands may be claimed as the property of others.

Page 41: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

41

For additional informationFor additional informationGeneral BackgroundGeneral Background

–– http://http://developer.intel.com/design/servers/ipmideveloper.intel.com/design/servers/ipmi//–– http://http://www.intel.com/go/hpcwww.intel.com/go/hpc//–– http://http://www.beowulf.orgwww.beowulf.org–– http://www.microsoft.com/windows2000/hpc/http://www.microsoft.com/windows2000/hpc/–– http://clusters.top500.org/http://clusters.top500.org/–– http://www.tc.cornell.edu/AC3/http://www.tc.cornell.edu/AC3/ (Advanced Cluster Computing Consortium)(Advanced Cluster Computing Consortium)

Cluster software platformsCluster software platforms–– http://http://www.scyld.comwww.scyld.com//–– http://http://www.compaq.com/hpcwww.compaq.com/hpc//–– http://http://pdswww.rwcp.or.jppdswww.rwcp.or.jp//–– http://http://www.openclustergroup.orgwww.openclustergroup.org–– http://www.microsoft.com/windows2000/technologies/clustering/defhttp://www.microsoft.com/windows2000/technologies/clustering/default.aspault.asp

Page 42: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

42

For additional informationFor additional informationURLsURLs

–– http://http://www.intel.com/go/hpcwww.intel.com/go/hpc//–– http://http://www.scyld.comwww.scyld.com–– http://http://www.beowulf.orgwww.beowulf.org–– http://www.microsoft.com/windows2000/hpc/http://www.microsoft.com/windows2000/hpc/–– http://www.dell.com/ (search for HPC)http://www.dell.com/ (search for HPC)–– http://http://www.compaq.com/hpcwww.compaq.com/hpc//–– http://www.tc.cornell.edu/AC3/ (Advanced Cluster Computing http://www.tc.cornell.edu/AC3/ (Advanced Cluster Computing

Consortium)Consortium)–– http://clusters.top500.orghttp://clusters.top500.org–– http://http://pdswww.rwcp.or.jppdswww.rwcp.or.jp–– http://http://www.opensce.orgwww.opensce.org–– http://http://www.openclustergroup.orgwww.openclustergroup.org–– http://www.microsoft.com/windows2000/technologies/clustering/http://www.microsoft.com/windows2000/technologies/clustering/

*Other names and brands may be claimed as the property of others.

Page 43: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

43

GlossaryGlossaryCluster Cluster -- a collection of connected a collection of connected independentindependent computers that work in unison computers that work in unison to solve a problem.to solve a problem.COTS COTS –– Commercial, Off The Shelf. The use of readily available producCommercial, Off The Shelf. The use of readily available products to ts to create a large machine.create a large machine.FCFC--PGA PGA –– Flip ChipFlip Chip--Pin Grid Array. A current packaging technique for Pin Grid Array. A current packaging technique for microprocessors from the Intel Corporation.microprocessors from the Intel Corporation.High Availability High Availability –– Another form of clustering to give nonAnother form of clustering to give non--stop computing in stop computing in the event of partial hardware failures.the event of partial hardware failures.HPC HPC –– High Performance Computing (supercomputer characteristics).High Performance Computing (supercomputer characteristics).Load Balancing Load Balancing –– Another form of clustering that distributes an incoming Another form of clustering that distributes an incoming workload across a series of computers. Usually in reference to workload across a series of computers. Usually in reference to incoming incoming TCP/IP traffic to a web site.TCP/IP traffic to a web site.MPP MPP –– Massively Parallel Processor. Linking together many (sometimesMassively Parallel Processor. Linking together many (sometimes as as much as 9000 processors) in a single machine through a specializmuch as 9000 processors) in a single machine through a specialized ed interconnect scheme.interconnect scheme.Parallel Computing Parallel Computing –– Calculating parts of the program at the same time to Calculating parts of the program at the same time to shorten the overall amount of time needed to solve the problem.shorten the overall amount of time needed to solve the problem.SPMD SPMD –– Single Program Multiple DataSingle Program Multiple DataSMP SMP –– Symmetric MultiSymmetric Multi--Processing. A configuration of processors in a single Processing. A configuration of processors in a single computer that allows all processors the same access to system recomputer that allows all processors the same access to system resources sources such as memory or I/O.such as memory or I/O.

Page 44: Achieving cluster reliability & cost effectiveness using …hpcs2003.ccs.usherbrooke.ca/slides/Libby.pdf · reliability & cost . effectiveness using . IPMI. ... Achieving cluster

44

GlossaryGlossaryECC ECC –– error correction circuit. Used to check memory and other bus error correction circuit. Used to check memory and other bus data for errors. If single bit errors are encountered this circdata for errors. If single bit errors are encountered this circuitry can uitry can correct them on the fly.correct them on the fly.FLOPS FLOPS –– floating point operations per second. A measure of the floating point operations per second. A measure of the capability of a machine or cluster to deal with scientific calcucapability of a machine or cluster to deal with scientific calculations.lations.FSB FSB –– front side bus. The connection of the microprocessor to the front side bus. The connection of the microprocessor to the rest of the system (memory, I/O etc).rest of the system (memory, I/O etc).LFS LFS –– log structured file system. A variation of a journaling file log structured file system. A variation of a journaling file system.system.MPI MPI –– message passing interface. A library specification for message passing interface. A library specification for messagemessage--passing, proposed as a standard by a broadly based passing, proposed as a standard by a broadly based committee of vendors, implementers, and users.committee of vendors, implementers, and users.MPICH MPICH -- a portable implementation of the full MPI specification for a a portable implementation of the full MPI specification for a wide variety of parallel computing environments, including wide variety of parallel computing environments, including workstation clusters and massively parallel processors (workstation clusters and massively parallel processors (MPPsMPPs) from ) from Argonne National Labs.Argonne National Labs.MPIlamMPIlam -- lam (local area lam (local area multicomputermulticomputer) is a freeware implementation ) is a freeware implementation of the of the mpimpi standard.standard.MPI/Pro* MPI/Pro* -- a commercial implementation of the MPI standard.a commercial implementation of the MPI standard.PVM PVM –– parallel virtual machine. A software package that permits a parallel virtual machine. A software package that permits a heterogeneous collection of UNIX*, Linux and/or Windows* NT heterogeneous collection of UNIX*, Linux and/or Windows* NT computers hooked together by a network to be used as a single lacomputers hooked together by a network to be used as a single large rge parallel computer.parallel computer.

*Other names and brands may be claimed as the property of others.