510
DESCRIPTION
510TRANSCRIPT
-
IBM Systems Group
9/29/2005 2004 IBM Corporation
Greg RodgersPeter Morjan
Sept 27, 2005
MareNostrum Training
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation2
Agenda
Greg Rodgers & Peter Morjan
Greg Rodgers
Greg Rodgers
Greg Rodgers
Instructor
DIM and Image Management4:30-6
Storage Subsystem2:30-4:00
LUNCH1:2:30
Network Overviewand Linux Services
11:30-1
Blade Cluster ArchitectureJS20 OverviewMareNostrum Layout
9:30-11:00TuesdaySept 27
TopicsTimeDate
Some detail on these charts will be added during class. Final charts will be available after class.
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation3
High-Capacity Multi-Network Linux Cluster Model
POWER server
POWER server
MyrinetHi speed fabric
Service LAN
A multi-purpose multi-user supercomputer
Reliable Gigabit Network
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation4
Multiple Networks in BladeCenter Clusters: 3 Networks in BladeCenter Cluster Architecture
service , out of band systems management reliable gigabit network for global access, net boot, image service, and GPFS high speed fabric for distributed memory apps (e.g. MPI) and optional IO
Features: Out of band service network gives physical security. Systems management
network isolated from users. BladeCenters controlled by SNMP commands on service network.
Cluster can be brought up without high speed fabric High reliable Gbe network helps to diagnose and recover from complex high
speed fabric issues Gbe Bandwidth sufficient bandwidth for root file system, allows for diskless
image management. Independent IO traffic. Heavy file IO wont impact a concurrent MPI user 2nd gigabit interface available for expansion
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation5
The MareNostrum Blade Cluster
p615 server
P615 server
2560 portMyrinet 2000
switch
Service LAN
FORCE10 Gigabit Network
20 DS4100 storage nodes
172 BladeCenters2406 Blades
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation6
BladeCenter System ManagementMethodology
Processor Blade
ENetCPU
ServiceProcessor
EthernetSwitch Module
CDROM/Floppy
Control Panel
Blower
Power
ENet
ManagementModule
lVPDLEDsVoltageTemperatureCPU I2C InterfaceFlash Update
ClusterManagement ServerBladeCenter
Chassis
Redundant System Components not shown
ENet
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation7
JS20 Blades, BladeCenter and Compute Racks
17.6GF
246 GF
1.48 TF
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation8
New Technologies used in MareNostrum
IBM Advanced semiconductor technology (CMOS10S 90nm) Hi speeds at low power
2.2Ghz PowerPC 970FX processor Industry leading 64-bit commodity processor Record price/perf in HPC workloads
IBM Blade Center Integration Record cluster density Improved cluster operating efficiency
(power, space, cooling) Speed of installation
IBM e1350 Support Provides Cluster level testing, integration,
and fulfillment
Hi-density Myrinet interconnect Significant reduction in switching hardware MPI performance that scales
Hi-density gigabit switch w/48 port linecards
Enterprise scale-out FAStT IBM Storage (TotalStorage 4100) with GPFS on 2000+ nodes Reliable and scalable global access filesystem
Linux 2.6 Enterprise and performance features to
exploit the POWER architecture VMX, large pages, modular boot
Diskless node capability Improved node reliability Reduced installation and maintenance costs-> Flexibility to change node personality
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation9
Agenda
Greg Rodgers & Peter Morjan
Greg Rodgers
Greg Rodgers
Greg Rodgers
Instructor
DIM and Image Management4:30-6
Storage Subsystem2:30-4:00
LUNCH1:2:30
Network Overviewand Linux Services
11:30-1
Blade Cluster ArchitectureJS20 OverviewMareNostrum Layout
9:30-11:00TuesdaySept 27
TopicsTimeDate
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation10
Anatomy of a Blade
CPU 1
CPU 2
DIMM 1
DIMM 2
DIMM 3
DIMM 4
BatteryBuzzer SW3
SW4
to I/O Exp.
DRIVE 1 DRIVE 2
DaughterCard
to Front Panel
to M
idplane
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation11
JS20 Blade
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation12
JS20 Processor Blade:
2-way PPC970 SMPNorthbridge with Memory Controller and Hypertransport I/O BusAMD Hypertransport tunnel to PCI-XAMD southbridge2 or 4 DIMMs (up to 4-8 GB)BladeCenter Service Processor2x1Gb Ethernet on board,
PCI-X attached (Broadcom)optional additional IO daughter card:
2x1Gb Ethernet (Broadcom) ... or2x2Gb FibreChannel (QLogic) orMyrinetPCI-X attached
Single-wide blade14 blades per chassis84 servers (168 processors) in a 42 U rack
PPC 970
U3 Northbridge
PPC 970
D DR
AMD 8131 Hyper Transport PCI-X
Tunnel
16 bit HT
GBIT BCM5704S
AMD 8111 Hyper Transport
I/O HUB 8 bit HT
Flash
Super I/O
USB K/M
HDM Connector
HDM Connector
IDE
Hawk SP
SMBus
D DR
D DR
D DR
Opt Fiber Ch.
Or Gbit ext
PCI-X
USB FDD/CDROM
MIDPLANE Baier/Lichtenau Nov1502
VRM
RS-485
PCI-X
IDE
NV RAM
Serial Port
JS20 Blade Logic
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation16
VMX vs MMX/SSE/SSE2
VMX
32 x 128-bit VMX registers
No interference with FP registers
No context or mode switching
Max. throughput: 8 Flops / cycle
Char Short Int Float
MMX / SSE / SSE2
8 x 128-bit SSE registers plus 8 x 64-bit MMX registers
MMX registers == FP registersMMX stalls FP
Context switching required for MMX
Max. throughput: 2 Flops / cycle
Char Short Int Long Float Double
+
-
+
+
+
+
-
-
-
-
Much more about VMX on Friday
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation17
Agenda
Greg Rodgers & Peter Morjan
Greg Rodgers
Greg Rodgers
Greg Rodgers
Instructor
DIM and Image Management4:30-6
Storage Subsystem2:30-4:00
LUNCH1:2:30
Network Overviewand Linux Services
11:30-1
Blade Cluster ArchitectureJS20 OverviewMareNostrum Layout
9:30-11:00TuesdaySept 27
TopicsTimeDate
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation18
MareNostrum Rack Summary
34 xSeries e1350 Racks
29 Compute Racks (RC01-RC29)- 171 BC chassis w/OPM & gb ESM- 2406 JS20+ nodes w/myrinet card
1 Gigabit Network Rack (RN01)- 1 Force10 E600 for Gb network- 4 Cisco 3550 48-port switchs
4 Myrinet Racks (RM01-RM04)- 10 clos256+256 myrinet switches - 2 Myrinet spine1280s
8 pSeries 7014-T42 Racks
1 Operations Rack (RH01)- 1 7316-TF3 display- 2 p615 mgmt nodes- 2 HMC 7315-CR2
-3 Remote Async Nodes- 3 cisco 3550 (installed on site)- 1 BCIO (installed on site)
7 Storage Server Racks (RS01-RS07)- 40 p615 storage servers
- 20 FAStT100 controllers- 20 EXP100 expansion
drawers- 560 250M SATA disks
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation19
27 bladecenter 1350 xSeries racks(RC01-RC27)
Box Summary per rack 6 Blade Center Chassis
Cabling External
6 10/100 cat5 from MM 6 Gb from ESM to E600 84 LC cables to myrinet switch
Internal 24 OPM cables to 84 LC cables
BladeCenter(7U)
BladeCenter(7U)
BladeCenter(7U)
BladeCenter(7U)
BladeCenter(7U)
BladeCenter(7U)
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation21
MareNostrum Rack Names
Blade CentersMyriet SwitchsStorage Servers
Operations rack and displayGigabit Switch 10/100 cisco switches
RC07
R
F B
RS06
RS05
RS04
RH01
RS03
RS02
RS01
RS07
RC06
RC05
RC04
RN01
RC03
RC02
RC01
RC11
RC10
RM04
RM03
RM02
RM01
RC09
RC08
RC27
RC26
RC25
RC24
RC23
RC22
RC21
RC20
RC19
RC18
RC17
RC16
RC15
RC14
RC13
RC12
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation23
MareNostrum Logical Names
Blade CentersMyriet SwitchsStorage Servers
Operations rack and displayGigabit Switch 10/100 cisco switches
s34s33s32s31s30s29
s28s2
7s26s25s24s23
s22s21s20s19
s41c3
mm
mc2
hmc1
cab
eza
s41ci
sco7
cisco5
cisco5
s18s1
7s16s15s14s13
s12s11s10s09s08s0
7
s06s05s04s03s02s01
s40s39s38s3
7s36s35
cisco4
cisco3
cisco2
cisco1
e600
s02c2
mm
s02c1
mm
s01c4
mm
s01c3
mm
s01c2
mm
s01c1
mm
s05c2
mm
s05c1
mm
s04c4
mm
s04c3
mm
s04c2
mm
s04c1
mm
s03c4
mm
s03c3
mm
s03c2
mm
S03c1
mm
s02c4
mm
s02c3
mm
s08c2
mm
s08c1
mm
s07c4
mm
s07c3
mm
s07c2
mm
s07c1
mm
s06c4
mm
s06c3
mm
s06c2
mm
S06c1
mm
s05c4
mm
s05c3
mm
s11c2
mm
s11c1
mm
s10c4
mm
s10c3
mm
s10c2
mm
s10c1
mm
s09c4
mm
s09c3
mm
s09c2
mm
S09c1
mm
s08c4
mm
s08c3
mm
RC0
7 RC06
RC05
RC04
RN01
RC03
RC02
RC01
RS07 RS06
RS05
RS04
RH01
R303
RS02
RS01
ms1
mc4
mc3
s14c2
mm
s14c1
mm
s13c4
mm
s13c3
mm
s13c2
mm
s13c1
mm
s12c4
mm
s12c3
mm
s12c2
mm
S12c1
mm
s11c4
mm
s11c3
mm
s17c2
mm
s17c1
mm
s16c4
mm
s16c3
mm
s16c2
mm
s16c1
mm
s15c4
mm
s15c3
mm
s15c2
mm
S15c1
mm
s14c4
mm
s14c3
mm
RC11
RC10
RM04
RM03
RM02
RM01
RC09
RC08
s20c2
mm
s20c1
mm
s19c4
mm
s19c3
mm
s19c2
mm
s19c1
mm
s18c4
mm
s18c3
mm
s18c2
mm
S18c1
mm
s17c4
mm
s17c3
mm
RC19
RC18
RC1
7 RC16
RC15
RC14
RC13
RC12
ms2
mc1
mc0
ms9
mc8
mc7
ms0
mc6
mc5
s23c2
mm
s23c1
mm
s22c4
mm
s22c3
mm
s22c2
mm
s22c1
mm
s21c4
mm
s21c3
mm
s21c2
mm
S21c1
mm
s20c4
mm
s20c3
mm
s26c2
mm
s26c1
mm
s25c4
mm
s25c3
mm
s25c2
mm
s25c1
mm
s24c4
mm
s24c3
mm
s24c2
mm
S24c1
mm
s23c4
mm
s23c3
mm
s29c2
mm
s29c1
mm
s28c4
mm
s28c3
mm
s28c2
mm
s28c1
mm
s27c4
mm
s27c3
mm
s27c2
mm
S27c1
mm
s26c4
mm
s26c3
mm
s32c2
mm
s32c1
mm
s31c4
mm
s31c3
mm
s31c2
mm
s31c1
mm
s30c4
mm
s30c3
mm
s30c2
mm
S30c1
mm
s29c4
mm
s29c3
mm
RC19
RC18
RC1
7 RC16
RC15
RC14
RC13
RC12
s35c2
mm
s35c1
mm
s34c4
mm
s34c3
mm
s34c2
mm
s34c1
mm
s33c4
mm
s33c3
mm
s33c2
mm
S33c1
mm
s32c4
mm
s32c3
mm
s38c2
mm
s38c1
mm
s37c4
mm
s37c3
mm
s37c2
mm
s37c1
mm
s36c4
mm
s36c3
mm
s36c2
mm
S36c1
mm
s35c4
mm
s35c3
mm
s41c2
mm
s41c1
mm
s40c4
mm
s40c3
mm
s40c2
mm
s40c1
mm
s39c4
mm
s39c3
mm
s39c2
mm
S39c1
mm
s38c4
mm
s38c3
mm
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation24
Mare Nostrum Scaled Floor Plan v1431 x 11 tiles (60cm x 60cm) 18.6m x 6.6m = 123 sq m18.6m x 8.2m = 153 sq m (including AC)
Blade CentersMyriet SwitchsStorage Servers
Operations rack and displayGigabit Switch 10/100 cisco switches
E600
Ba
ck D
oor to
loading d
ock
Row 1 2 3 4 5
R
F B
cisco
cisco
L
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation26
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation27
1 operations pSeries rack (RH01) Box summary
1 Display 2 HMC 2 p615 3 16-port Remote Async Nodes RAN#0-2 BCIO (manually installed) 3 CISCO 3550 (manually installed)
Cabling External
2 Gb for p615 to E600 40 Serial lines from RAN#0-2 to p615s 8 Gb for BCIO to E600 40 cat5 from p615s to cisco 4 cat5 uplinks from ciscos in RN01
Internal HMC RAN#0 RAN#0 to RAN#1 RAN#1 to RAN#2 2 p615s to RAN#0 KVM Display to HMC P615s cat5 to cisco BCIO MM cat5 to cisco 2 cat5 uplinks from cisco to cisco Note: One of the p615s in this operations
rack will do diskless image support for 3 Bladecenters
P615 (4U)P615 (4U)
HMC (4U)7135-C02
Display (1U)
3 RANsSerial mux
Backup HMC7135-C02
RH01BCIO
BladeCenter(7U)Final placement subject to on-site analysis.
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation29
Agenda
Greg Rodgers & Peter Morjan
Greg Rodgers
Greg Rodgers
Greg Rodgers
Instructor
DIM and Image Management4:30-6
Storage Subsystem 2:30-4:00
LUNCH1:2:30
Network Overview and Linux Services
11:30-1
Blade Cluster Architecture9:30-11:00TuesdaySept 27
TopicTimeDate
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation30
The MareNostrum Blade Cluster
p615 server
P615 server
2560 portMyrinet 2000
switch
Service LAN
FORCE10 Gigabit Network
20 DS4100 storage nodes
172 BladeCenters2406 Blades
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation31
Ethernet Switches
Layer 3 Nortel SwitchLayer 2/3 Cisco Switch
Layer 3/7 Nortel Switch
D-Link Switch
BladeCenter I/O Switch Flexibility
Pass ThruModule
Optical Pass-thru
Service network
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation32
MareNostrum Networks
Gigabit Network Myrinet Network Service Network Serial Network
The p615s remote management network
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation33
MareNostrum Networks
M
!"#
$
!%&
'
%
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation34
1 network xSeries 1350 rack (RN01)
FORCE10 E600(16U)
Box summary 1 FORCE10 E600 4 Cisco 3550
Cabling External (VERY HEAVY 390 total cables)
162 Gb from BC in compute racks to E600 8 Gb from BCIO to E600 42 Gb from p615s to E600 163 cat5 from MM to cisco 12 cat5 from Myrinet switches to cisco 3 cisco uplink to cisco in RH01 Future option (42 fiber gige cables to e600 fiber
card) Internal
1 10/100 from Force10 service to Cisco
RN01
Free (24U)
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation36
Interconnection of BladeCenters Used for system boot of every BladeCenter 212 internal network cables
170 for blades 42 for file servers
76 ports available for external links
Gigabit Network : Force 10 E600
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation38
MareNostrum Network Review
p615 server
P615 server
2560 portMyrinet 2000
switch
Service LAN
FORCE10 Gigabit Network
20 DS4100 storage nodes
172 BladeCenters2406 Blades
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation39
Myrinet Switch Internals
LED diagnostic display
PPC Linux Diagnostic Module
14U Aluminum chassis with handles
Integrated quad-ported spine slots 4x64
16x16 host port slots
Front to rear air flow
Hot swap redundant power supply
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation40
Myrinet Switch Cabling- 126 host cables per side from a
full 84-blade rack bundle and half arack bundle. Call these H84B andH42B bundles. Each switch managesthree racks.
- 64 quad cables routed verticallyupward to spine from 4 center cards. Call this bundle a Q64B. There are 10 Q64Bs.
- Avoid blocking air intake at bottom.Worst case blockage by 2 Q64Bsin top switch. Ensure enough slack to swap middle power supplies.
-- LCD display will not be blocked.
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation41
4 myrinet xSeries 1350 racks(RM01 RM04)
RM01 RM02 RM03 RM04
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetClos256x256
(14U)
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetClos256x256
(14U)
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetSpine1280
(14U)Myrinet
Spine1280(14U)
Box summary 10 Clos256x256
switches 2 Spine 1280
Cabling External
12 10/100 cat5 2364 LC cables
Internal 640 Quad Spine
cables (over top)
Complex myrinet cabling is covered with more detail in the next charts.
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation42
Myrinet Spine Cabling- 10 Q64B bundles from center 4 cards in switches provide 640 quad cables at top of myrinet racks to redistributed to 4 Q160B bundles. Cables are 5m.
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetClos256x256
(14U)
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetClos256x256
(14U)
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetClos256x256
(14U)Myrinet
Clos256x256(14U)
MyrinetSpine1280
(14U)Myrinet
Spine1280(14U)
Q64B Q64B Q64B Q160B Q64B Q64B Q160B Q160B Q64B Q64B Q160B Q64B Q64B Q64B
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation43
Myrinet Cable Bundle Summary640 Quad 5m Interswitch Cables
Q64B Q64B Q64B Q160B Q64B Q64B Q160B Q160B Q64B Q64B Q160B Q64B Q64B Q64B
2364 Host Fiber Cables
Note: 8 racks have 84-way bundle split to two 42-way bundles below myrinet rack.
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation44
Myrinet 2560-port full bisection
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation51
Local Customization of Linux Services
All scripts for Linux services should be in /etc/init.d
All scripts should be installed with insserv command
All scripts should follow rules for specifying dependencies.
See: man init.d , man insserv
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation53
Agenda
Greg Rodgers & Peter Morjan
Greg Rodgers
Greg Rodgers
Greg Rodgers
Instructor
DIM and Image Management4:30-6
Storage Subsystemand p615 management
2:30-4:00
LUNCH1:2:30
Network Overview11:30-1
Blade Cluster Architecture9:30-11:00TuesdaySept 27
TopicTimeDate
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation54
MareNostrum Network Review
p615 server
P615 server
2560 portMyrinet 2000
switch
Service LAN
FORCE10 Gigabit Network
20 DS4100 storage nodes
172 BladeCenters2406 Blades
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation55
MareNostrum Storage Subsystem
POWERStorage Server
scsi disks
POWERStorage Server
scsi disks
Fiber ChannelStorage
Each POWER storage server manages 56 blades and half of 7TB Fiber channel storage nodefor redundancy
Root filesystems are contained on scsi disks. Fiber channel storage is used for parallel file system.
Repeat 20 times
Storage servers are both image servers and GPFS storage servers
FastT100
controlle
r
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation56
140TB Storage Subsystem
RS01 RS02 RS03 RS04 RS05 RS06 RS07
20 * 7TB Storage Server NodesEach storage server node consists of
2 p615 1 FAStT 100 controller with 3.5TB1 EXP100 SATA drawer with 3.5TB
2 p615 is 8U FAStT100 is 3U EXP100 is 3UTotal Storage Node is 14U3 Nodes per rack
p615
FAST-T100
EXP100
p615
SN03
SN02
SN01
SN06
SN05
SN04
SN20
SN19
SN09
SN08
SN07
SN11
SN10
SN14
SN13
SN12
SN17
SN16
SN1514U Free SN18
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation57
6 storage pSeries rackswith 3 storage nodes each(RS01, RS02, RS03, RS05, RS06, RS07) Box summary
6 p615 3 FAStT100 3 EXP100
Cabling External
12 10/100 cat5 12 Gb 12 Myrinet 6 Serial
Internal 2 p615 to FAStT100 FAStT100 to EXP100
P615 (4U)P615 (4U)
FAStT100(3U)EXP100(3U)
P615(4U)P615(4U)
FAStT100(3U)EXP100(3U)
P615(4U)P615(4U)
FAStT100(3U)EXP100(3U)
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation58
1 storage pSeries rackwith 2 storage nodes(RS04)
Box summary 4 p615 2 FAStT100 2 EXP100
Cabling External
8 10/100 cat5 8 Gb 8 myrinet 4 serial
Internal 2 p615 FC to FAStT100 FAStT100 to EXP100
P615 (4U)
P615 (4U)
FAStT100(3U)
EXP100(3U)
P615(4U)
P615(4U)
FAStT100(3U)
EXP100(3U)
Free (14U)
RS04
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation62
p615 Remote Control
The hmc can remote power and provide console to any p615.
See HMC manual for instructions. Effective System Management using IBM Hardware Mangement Console for
pSeries Manual has lots of stuff not related to MareNostrum regarding partitioning.
Remember: no partitioning means one partition per system Two key commands youll need to learn
mkvterm Chsysstate
If you write scripts on the hmc, back them up somewhere else. An hmc reload will wipe them out
Recommend scripting remote console command
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation63
Serial cabling for p615 service network3 RANs needed
only 2 are shown.
No connection to 7040 frame required.
Managed System is each p615 server
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation64
p615 performance
Optimal adapter placement depends on bus structure Built-in 10/100/1000 is an optimal IO interface Build-in 10/100 is used for service network Adapters on MareNostrum p615s
Two Myrinet 4 meg cards 1 Emulex fiber channel adapter 1 Fiber gigabit card (not used)
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation65
p615 performance continued
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation66
Agenda
Greg Rodgers & Peter Morjan
Greg Rodgers
Greg Rodgers
Greg Rodgers
Instructor
DIM and Image Management4:30-6
Storage Subsystem2:30-4:00
LUNCH1:2:30
Network Overview11:30-1
Blade Cluster Architecture9:30-11:00TuesdaySept 27
TopicTimeDate
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation67
DIM DIM = Diskless Image Management
DIM not a great name, not really diskless Prototyped on MareNostrum to operate blades as if they were diskless
DIM is utility software copyright IBM, made available to BSC, not for redistribution.
Other advantages: Asynchronous image management Single image maintenance Speed: No noticable performance degredation even with oversubscribed ethernet. Zero blade install time. No Linux distro modification. Efficient image management: over 2 million rpms on MareNostrum.
efficient yet not minimalistic Local hard drive available for user /scratch
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation68
Basic DIM Process
Install Linux and manage image on master blade(s)
Can use multiple blades for different images gnode (s41c3b13) mnode (s41c3b14)
Clone blade image using dim_update_master brute force rsync of root directories
Distribute master copy to clones read-only and read-write parts Intelligent rsync with filters Can be done with blades up or down
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation69
DIM verses Warewolf DIM scales to thousands of nodes with 2-level hierarchy
DIM has large shared read-only parts of the image Fast update Storage efficiency
Allows complete distro, not minimalistic Exploit caching on image server
DIM can update images during operation with or without a running client. Warewolf (like rocks) rebuilds the image for any change, like a new rpm or new user
DIM uses loopback mounted filesystems on the image server to control quota Also allow several types of network data transport including NFS, NBD, and iSCSI.
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation70
Diskless Image Management
Extensive use of Linux 2.6 dynamic loading and linuxrc
-
IBM Systems Group
MareNostrum Training Class | 9/29/2005 2004 IBM Corporation72
Dim Services Required
DHCP The /etc/dhcpd.conf is the master database
NFS NFS server required on all DIM image servers and NFS client required
on DIM rsync
Required on DIM image servers ssh tftp xinetd