michigan!/usr/group: compute clusters building blocks …€¦ · compute clusters—building...
TRANSCRIPT
![Page 1: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/1.jpg)
Innovation Intelligence®
Michigan!/usr/group:
Compute Clusters—Building Blocks of the
Public Cloud
Jeff Marraccini, Vice President, Computer Systems
August 2015
![Page 2: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/2.jpg)
About me, and yes, the disclaimer
• Work at Altair Engineering in Troy
18 years
• My team manages a number of clusters
• I manage staff that handle our internal
clusters in a 2,300+ employee company, so:
My employer may not agree with all my
opinions – they are my own. I am also
a generalist. Check with others before
spending money on a cluster.
• Like many things, there are no “one size
fits all” solutions with HPC! Please research!
Fa
bric
Head Node
Storage
Exec Nodes
Visualization Nodes
![Page 3: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/3.jpg)
Thank you, and seeing this stuff for real
Michigan!/usr/group contributed to my career – thank you!
Past and present members contributed to tools we use daily. Preaching to
the choir here: knowledge exchange empowers us all.
Tours:
I cannot show too much live while we are recording.
I would be glad to give you a tour if you are in the Troy, MI area – please
message me at [email protected]. Must agree not to reveal operational
specifics.
![Page 4: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/4.jpg)
Overview of today’s talk
• Why clusters?
• Some history
• “Private cloud” clusters
• Architecture
• Failures
• The Virtual Machine era
• The Container / Docker era
• “Public cloud” clusters
• Facebook and the Open Data Center
• Appliance Computing
• Resources to learn more
![Page 5: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/5.jpg)
Why clusters? And what’s the big deal?
• Mainframe costs, even today
• Individual server performance and Moore’s Law
• Networking + computers + “cluster software” = often vast power
• What do we do with these 3-5 year old computers on a 7-10 year budget
cycle?
• Sony PlayStations, Apple XServes, Raspberry Pi
• Operating systems (usually) no longer as expensive as the computer
![Page 6: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/6.jpg)
Universities, government agencies, companies,
and basements near you…
• They got us started…
• NASA BEOWULF (you may be using a BSD/Linux Ethernet driver based on
Donald J. Becker @ NASA’s work!)
• NSA fed back scalability ideas (!!), early adopter
• Older operating systems: Tandem, Digital VAX/VMS & OpenVMS, Some UNIX,
Microsoft Windows Server Clusters
• Universities world wide – open source contributions
• Military projects
• Basement clusters run by grad & undergrad students
• LucasFilm and related special effects firms
• MASSIVE (Peter Jackson/WETA Digital!) – got us into 10GbE message passing
![Page 7: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/7.jpg)
What do they do?
• Scientific and engineering computing – the start of it all
• Render farms – special effects for movies, TV, commercials, games, live
TV and sports overlays…
• Media conversion (YouTube!)
• Web services, E-Mail at scale
• BitCoin and other computational currency
• Databases, “Big Data”
• Scale out Storage (EMC Isilon is an InfiniBand cluster!)
• Building and testing software (my workplace)!
• Social media (combining a lot of the above)
• Cracking passwords, encryption
• Neural networks / expert systems / IBM WATSON
![Page 8: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/8.jpg)
Some of the largest clusters are…
• 10’s-100’s of thousands of cores
• NSA (probably), along with other governments’ security arms
• Other classified installations
• CERN
• Research labs (NCSA near Chicago is one)
• Public clouds (Google, Amazon, Microsoft, Rackspace, IBM, others)
• 1’s-10’s of thousands of cores
• Square Kilometer Array (Australia / South Africa, just got back from there)
• Weather forecasting
• Japan’s Earth project (early 2000’s)
• Render “farms”
• Large organizations (corporate, universities, “smaller” public cloud providers)
• Small businesses often have dozens to hundreds of cores, and may not
realize it if leasing private and/or public cloud services!
![Page 9: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/9.jpg)
10,000 hands working in the space of a living room
“Cluster programming is a lot like putting a large puzzle together with 10,000 hands in the space of a living room, keeping them in sync”
- Altair developer when I reported a memory leak
![Page 10: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/10.jpg)
Software development complexities & architecture
• Message passing (MPI) libraries, achieve huge scales
• Shared memory with proprietary interconnect (Some Cray, NEC, SGI
Altix)
• Process Migration (LinuxPMI, OpenMOSIX, some Cray, NEC, SGI Altix
UV)
• Systemd (w/ cgroups) is really nice on clusters as it reduces start up
and restaging latency due to parallel daemon startup and reduces shell
script complexity
• Ansible, Salt, and other configuration automation tools for sysadmin
![Page 11: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/11.jpg)
“Private Cloud”
• Internal use clusters
• Sometimes accessible via remote access, Virtual Private Networks
• “Secret sauce” behind internal tools, some of which now have public
cloud front ends
• Requires a forging of networking, storage, and computing teams
• Oracle 10g databases often first exposure to IT
• Scalable internal storage (EMC Isilon, ExaGrid, HP 3PAR, Ceph, etc.)
![Page 12: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/12.jpg)
High Availability Private Cluster Block Diagram
Firewall
• Protects often unpatched cluster software and firmware
• Load balancer
• Remote access
Head Nodes
• 1
• 2
• Authentication, Scheduling, Staging, Reloading, Push notifications, Periodic Check-pointing
Switch Fabrics
• 1
• 2
• Infiniband, 1/10/40/100GB Ethernet, Proprietary (Cray!)
Execution Nodes
• 1 … N
• Local storage, local “scratch”
Shared Storage Pools
• Staging
• Check-points
640 core half rack SuperMicro
TwinBlade chassis w/ 100TB usable
storage, QDR InfiniBand, ~~9 kW
2 X this for high availability
![Page 13: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/13.jpg)
Altair’s Internal Clusters
• We use PBS Professional for all (it is our product!)
• HyperWorks Unlimited – “cluster in a box” – many around the world,
hundreds to 2048 cores, single rack or virtual clusters in public clouds
• Legacy “E-Compute” & Compute Manager (newer) – several clusters of
a few hundred cores each
• HyperWorks – several hundred cores, Windows, Linux, Mac - Michigan
and India, 80+ compilations (400K+ files/each), thousands of tests daily
• Test clusters – 128-256 cores, often restaged, scrounged older hardware
![Page 14: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/14.jpg)
A regular cluster (or a basement one!)
Head Node
• Authentication, Scheduling, Staging, (Reloading, Push notifications, Periodic Check-pointing)
Cluster fabric(s)
• Ethernet switch
• Infiniband switch
• Storage Area Network
Execution Nodes
• 1 … N (could be varying hardware)
• Local storage (maybe!)
Shared Storage Pools
• Staging
• Checkpoints (maybe!)
• Could be FreeNAS, Lustre, Isilon…
Could well ALL be running on a single
virtual machine hypervisor for dev &
test!
![Page 15: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/15.jpg)
An Engineer’s Patience
96 core job running on part of the cluster from the previous slide:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
182651.XXXXXXXX YYYYYYYY radioss- ZZZZZZZZZZ 29362 8 96 -- -- R 36:41
node006/0*12+node007/0*12+node009/0*12+node010/0*12
+node011/0*12+node012/0*12+node013/0*12+node014/0*12
Without oversubscription, that cluster may run 10 96 core jobs at once.
Most jobs on it run longer than a day – some for a couple weeks.
We are very paranoid when someone opens the cabinet doors on it…
![Page 16: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/16.jpg)
The Fabric – cluster scaling and speed
• InfiniBand (10-56Gb/s, low latency)
• MyriNet (obsoleted, fiber optics)
• PCIe
• Ethernet (10GbE/40GbE/100GbE)
• Proprietary (CrayLink and others)
• Virtual network switches
256 core SGI half rack, QDR InfiniBand, Nvidia GPU’s,
Ethernet 1Gb/s mgmt, no HA. Surprisingly quiet in full use!
![Page 17: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/17.jpg)
Storage
Varying needs = varying capacities (Computational Fluid Dynamics/CFD,
“crash”, chemistry, optimization, Bitcoin, hash cracking…)
Cluster storage is HARD, especially scale out – “Big Data” approaches not
good back end storage for scientific/engineering computing (yet)
Reliability - High availability often is more than 2X the cost
Local storage limits (blades, enterprise SSD, 2.5” HDD)
Spinning it down when portions idle = complex
![Page 18: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/18.jpg)
Management
• Staging the nodes – potentially thousands during install and upgrades
Herding cats = scheduling different user communities’ requirements
Failures and recovery
• Staging jobs in/out – a CFD project may be 1TB+ of output * 200 jobs
• Push notifications, “Is it done yet?”
• Portals
• Continuous resource monitoring
• Check-pointing
• Energy efficiency
![Page 19: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/19.jpg)
When it breaks
• Nodes will fail
• We have hardware failures every week, bigger clusters may have hourly
failures or even more
• Check-pointing = costly in storage and processing time, see
http://www.csm.ornl.gov/~engelman/publications/wang10hybrid2.pdf
• Restoring from a checkpoint may be unreliable
• Restaging
• Job migration
• Jeff’s “I meant to type a 11 and typed 1” glitch
• The dreaded faulty InfiniBand cable
• “If you monitor me, my job slows down!”
![Page 20: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/20.jpg)
The Virtual Machine Cluster
• Great way to demo cluster software, Ansible/Salt, etc.
• SIMH & OpenVMS (Jeff’s VMS cluster on a Surface Pro 3 tablet)
• Multics may now be emulated, see http://multicians.org/
• Virtual network switches work great on multi-core hosts
• “Pull” the virtual network cable, see if the storage busts
• Test your upgrades
• Learn without spending $50,000+
• Hypervisors add I/O latency
• Fabric support limited
• = Scalability limited
![Page 21: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/21.jpg)
The Container / Docker – More than a fad
• Famous “Pets” vs “Livestock” (some call “Cattle”) argument for
application design
• Single operating system per host, operating system ensures containers
are sandboxed from each other AND they have cluster fabric access!
• Multiple containers (load balancer + web server + app server + database
server + log server) may be spun up and scaled with appropriate app
design
• Still have to patch the containers if there are vulnerabilities inside!
Ansible, etc. useful!
![Page 22: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/22.jpg)
“I’m out of oomph” -> BURSTING
• “Promise” of the Public Cloud
• Credit card financed computing
• Possibly loosely coupled
• Fabric compromises
• Getting better!
Internal ClusterVPN to Amazon AWS/Microsoft
Azure
Cloud Execution Nodes
Cloud fabric
Cloud storage
![Page 23: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/23.jpg)
Spread out clusters
• May be in the “Public Cloud” or at multiple “Private Cloud” sites
(research centers, remote data centers, leased private capacity)
• Redundancy – Hadoop and derivatives quickly copy object data and
store archival copies, etc.
• Scalability, 100Gb/s inter-data-center links now common
• Lots of “dark fiber” available for leasing
• Watch out for latency sensitive implementations
![Page 24: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/24.jpg)
Facebook and Open Compute Project
• Mainly useful for big organizations
• Power efficiency, reduce impact
• Shared power supplies
• Optimized cooling
• Storage & node spin-down
• Designed to fail and be easily serviceable
• Quick upgrades
• Scalability beyond conventional designs
• Might slow down commodity server price drops, volume decreasing
• http://www.opencompute.org/
![Page 25: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/25.jpg)
Appliances and Platform as a Service (PaaS)
• “Cluster in a box” (well, racks!) or cloud
• Bursting
• Project-based computing
• Nimble
• Geek skills embedded
• Easy portal / front ends
![Page 26: Michigan!/usr/group: Compute Clusters Building Blocks …€¦ · Compute Clusters—Building Blocks of the Public Cloud ... • HyperWorks –several hundred cores, Windows, Linux,](https://reader031.vdocuments.us/reader031/viewer/2022020316/5b342df07f8b9ae1108de432/html5/thumbnails/26.jpg)
Where do we go from here?
• Public library access to Lynda.com – Amazon AWS & Microsoft Azure
“Up and Running” courses
• SIMH hobbyist OpenVMS cluster: https://vanalboom.org/node/18
• OpenStack on virtual machines: http://www.openstack.org/ and
http://docs.openstack.org/developer/devstack/#quick-start
• Example appliance: http://www.altair.com/hwul/
• PBS Professional, IBM LSF, Grid Engine, other cluster mgmt. software
• OpenStack Ceph scalable block storage: http://ceph.com/
• Lustre storage free software: http://wiki.lustre.org/
Aside from security, the ability to build and maintain private and public
cluster systems are near the top of the pay scale in IT!