strategies in cluster-design · – i.e. rockscluster, centos, pelicanhpc, corvix, ... good source...
TRANSCRIPT
-
Strategies in Cluster-Design
Gerolf Ziegenhain, TU Kaiserslautern, Germany
-
Outline of This Talk● Look at the technologies once again● Provide more detail for making decisions● What to consider?● What should be avoided by all means?● Provide keywords / directions for further reading
● Less organized talk● Contains personal experience
-
Making Decisions● Strategic decisions:
– Do once, changes difficult, expensive● Setup relatively easy● Therefore know some numbers
(per person, group, university)– #jobs– Runtime of jobs– CPUs per job– Memory per job– Coupling of system: latency / bandwidth– Hdd storage (also consider final storage)
-
Buy or Build?● Buying
– Less work– More costs– You will have more than you want– Vendor may help in consulting
● Building yourself– More work– High learning effect– Less costs– You will have what you buy
-
Technological Overview
-
DHCP
NIS
Firewall
Queue
Syslog
Login1
Mirror
Mail
Boot
Admin Login2
NAS1
NAS3
NAS2
User1 User3
User2
Nodes
Components of a Cluster
-
Networking
-
A Word on Entropy● Managing 10 Workstations differs a lot from
managing a cluster● Entropy of cables
– Sort them immediately– Use colors– Use hook-and-loop-tape– Use printed labels
-
Choice of Hardware● Nodes● Networking● Overhead servers
-
Choosing Nodes
-
Example: Google
-
Example: Google● Stock hardware● Custom build low-tech cases● Modular approach● Components
– Mainboard, CPU, Memory– 2x HDD (Stripe)– UPS Battery
● Advantage:– Cheap– High learning effect
-
Example: BlueGene/P● PowerPC● Custom build
– Boards– Chips – Networking
● Advantage:– Scales very good
-
Buy a Rack● Common beowolf cluster● Buy ready-built 19”
pizza-boxes● Mounting in 19” rack
– Usually 42HE● Advantage
– Less work– High packing density
-
Use Ready-Built Desktops
-
Processors and Architectures?● Know your problem● What to know about your algorithms?
– How much memory?– Can the problem easily be decomposed?– What precision?
● Libraries– Do they exist for your problem (i.e. QM calculations)– Do they run on all architectures?
● Choices:– Architecture (usually AMD / Intel is a good choice)– #CPUs
-
Storage Management● Know your problem● Parameters to know
– How much HDD space?– What is the common bandwidth?
● Evaluating 100GB files in real-time?● Writing out 1TB files?
● Choices: – NAS (multiple?)– SAN– Distributed filesystem
-
Backup● RAID ≠ backup
– You still can kill your stuff byrm -rf /my_stuff
● Incremental backup – Critical user configuration– Configuration files– Complete overhead server installation
-
Networking
-
Types● Know your problem● Choices
– Bandwidth● Gbit < Infiniband● Gbit: channel bonding possible
– Latency time ● Gbit > SCI
– Scalability● Stacked network switches● Fat tree architecture
-
Switches● Important parameters
– Backbone speed ● throughput when all ports are under load?
– Can it be configured? ● Auto-Sensing ● IP ● ARP● ...
– Stackable?– (Uplink ports?)
-
Which #Cores/Node is Optimum?● Currently cheapest cost per core: 8 cores per node● Small systems (48 nodes)
– Doesn't matter because one switch is enough● Average systems
– Do you need all-to-all connections?– Different rings or change network topology– If you want to stick to single-switched networks:
current optimum is 16 cpu per node for this● Big system
– Go for fat tree network :)
-
Infrastructure Requirements● Cooling
– Each W burned in CPU heat⇒● Stable power supply
– Black out?– Fluctuations in voltage level
● Cheap power supplies will break on fluctuations
-
Notes about Power Consumption● Less power consumption
less heat ⇒ less defects(?)⇒
● Running costs per year can easily reach initial investment costs!– Do the math blade center could also pay off!⇒
● Do not switch on / off all nodes at once– Voltage peaks!
-
Decomposition of the Servers
-
Why Separate Login Nodes?● User interaction● May hang due to jobs ● Security
– Ssh ports open– May be hacked
● Configuration of user packages– System more on bleeding edge
-
Splitting Servers● Easily >10 overhead tasks● Why not in one big server?
– Security (one hole all broken)⇒– Stability– Maintenance
● Updates (what was done 3 years ago?)● Dependencies (how do software packages interfere?)● No plugin structure (no testing of different variants)
● Solution– Split the tasks >10 overhead servers⇒– Problem:
● Cost ● Hardware failures?
-
Combining Servers● Use XEN ● Host servers: 1...3 servers
– Hardware failure tolerant● Further advantages
– Extremely reduced costs– Complete rollback possible– Try different configurations
● Experiments are possible with limited budget– Clear separation of tasks
-
Administration
-
Administration Policies● Interaction with human beings
– Difficult social aspects– Good administrator is never realized (system works)
● Who has the root password?● Who will document what has been done?● Split the work, but communicate:
– Design decisions– Buying, writing grant proposals– Installation, bug fixing– Educating end-users
-
Administration Policies● User interaction
– Keep the users informed (mailing list)– Monitor system to cancel out problems before they
occur
-
Managing Different Groups● Impossible!● Each group has to provide at least one person
– Managing user education– Monitoring performance– Know the needs ( cluster design decisions)⇒
⇒ Sharing administrator not possible!● Sharing resources: possible & meaningful
-
What is the Critical Data?● What data has to be stored?
– User programs– Final data– May be put on RAID-Mirror
● What data can be exposed to potential loss?– Temporary files– May be put on RAID-Stripe
-
Compilation● Custom user programs / libraries● Where to install
– /usr/local/ (system-wide)– $HOME (per-user)
● Autotools provide possibility to install whole distribution in home-directory!
⇒ Depends on how often the code changes● Choosing a compiler
– GNU compilers are good & free– Special CPU instructions: buy a compiler
● Intel compiler● Portland compiler
-
Security● University networks are
– Insecure– Treasured victims
● Risks– Ssh password login– Open ports– Updating
● Keep up to date with serious bugs!– Users
● Therefore (attacks will happen on daily basis!)– Use firewall– Monitor system for odd behavior
-
Operating Systems
-
Which Operating System?● Different OS / distributions exist
– But widely compatible configuration– Way of doing stuff differs slightly in detail
● I.e. Directories / files– Watch out for licenses: BSD, GPL, ...
● OS: provide basic stable & secure functionality– Linux
● Debian● RedHat● SuSE (slow, costly, small community)
– FreeBSD (more secure, but ~older versions)– OpenBSD (most secure)
-
Updating or not?● Motivations
– Stability– Security– Features
● Possible solution: – Keep login servers and firewall up-to-date– Keep computation nodes stable (out-of-date)– Works only if nodes are in inner network
-
Rolling your own distribution● Possible solution for installation issues● Possibilities
– From scratch distribution– Modify existing distribution– Compiling only custom packages (/usr/local/bin)– Keep system hdd-images and clone them
-
Lesson Learned● Reproduceable?
– Making a distribution is exhausting● Documentation (wiki)
– Someday you have to handover– Or reinstall
● Keep a complete mirror – Packages may vanish
-
The Gentoo-Approach● Use source-packages● Autotools binary files⇒● Create special configuration files for dependencies
– In gentoo: portage (→ corvix: egatrop)– In bsd: ports
● Alternative– Linux from scratch
● Missing the configuration files ● Rely on autotools
– Arch linux● Websites are good sources for step-by-step
howtos
-
The Debian-Approach● Compile once, distribute binary packages● Create custom-packages with only one command● Advantage
– Extremely fast– Easier to maintain for big number of servers– Embedded devices use similar packages architecture
-
Our solution● Stable basis system:
– Debian overlays● Additional package source with custom packages
– Xen images of the installed debian-system ⇒ Even faster reinstallations
● Custom software– I.e. user demanded libraries– Compilation in ~
-
Other cluster distributions● Debian-Based / RedHat-Based exist
– I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ...● Good source for howtos● Good as cheat-sheet● But
– HPC is inherently customized– Flexibility highest with customized installation– None of the distros solved a problem that we had
-
Thank you!
● Acknowledgements– Prof. Dr. rer. Nat. Herbert M. Urbassek,
TU Kaiserslautern, Germany