strategies in cluster-design · – i.e. rockscluster, centos, pelicanhpc, corvix, ... good source...

Strategies in Cluster-Design

Gerolf Ziegenhain, TU Kaiserslautern, Germany

Outline of This Talk● Look at the technologies once again● Provide more detail for making decisions● What to consider?● What should be avoided by all means?● Provide keywords / directions for further reading

● Less organized talk● Contains personal experience

Making Decisions● Strategic decisions:

– Do once, changes difficult, expensive● Setup relatively easy● Therefore know some numbers

(per person, group, university)– #jobs– Runtime of jobs– CPUs per job– Memory per job– Coupling of system: latency / bandwidth– Hdd storage (also consider final storage)

Buy or Build?● Buying

– Less work– More costs– You will have more than you want– Vendor may help in consulting

● Building yourself– More work– High learning effect– Less costs– You will have what you buy

Technological Overview

DHCP

NIS

Firewall

Queue

Syslog

Login1

Mirror

Mail

Boot

Admin Login2

NAS1

NAS3

NAS2

User1 User3

User2

Nodes

Components of a Cluster

Networking

A Word on Entropy● Managing 10 Workstations differs a lot from

managing a cluster● Entropy of cables

– Sort them immediately– Use colors– Use hook-and-loop-tape– Use printed labels

Choice of Hardware● Nodes● Networking● Overhead servers

Choosing Nodes

Example: Google

Example: Google● Stock hardware● Custom build low-tech cases● Modular approach● Components

– Mainboard, CPU, Memory– 2x HDD (Stripe)– UPS Battery

● Advantage:– Cheap– High learning effect

Example: BlueGene/P● PowerPC● Custom build

– Boards– Chips – Networking

● Advantage:– Scales very good

Buy a Rack● Common beowolf cluster● Buy ready-built 19”

pizza-boxes● Mounting in 19” rack

– Usually 42HE● Advantage

– Less work– High packing density

Use Ready-Built Desktops

Processors and Architectures?● Know your problem● What to know about your algorithms?

– How much memory?– Can the problem easily be decomposed?– What precision?

● Libraries– Do they exist for your problem (i.e. QM calculations)– Do they run on all architectures?

● Choices:– Architecture (usually AMD / Intel is a good choice)– #CPUs

Storage Management● Know your problem● Parameters to know

– How much HDD space?– What is the common bandwidth?

● Evaluating 100GB files in real-time?● Writing out 1TB files?

● Choices: – NAS (multiple?)– SAN– Distributed filesystem

Backup● RAID ≠ backup

– You still can kill your stuff byrm -rf /my_stuff

● Incremental backup – Critical user configuration– Configuration files– Complete overhead server installation

Networking

Types● Know your problem● Choices

– Bandwidth● Gbit < Infiniband● Gbit: channel bonding possible

– Latency time ● Gbit > SCI

– Scalability● Stacked network switches● Fat tree architecture

Switches● Important parameters

– Backbone speed ● throughput when all ports are under load?

– Can it be configured? ● Auto-Sensing ● IP ● ARP● ...

– Stackable?– (Uplink ports?)

Which #Cores/Node is Optimum?● Currently cheapest cost per core: 8 cores per node● Small systems (48 nodes)

– Doesn't matter because one switch is enough● Average systems

– Do you need all-to-all connections?– Different rings or change network topology– If you want to stick to single-switched networks:

current optimum is 16 cpu per node for this● Big system

– Go for fat tree network :)

Infrastructure Requirements● Cooling

– Each W burned in CPU heat⇒● Stable power supply

– Black out?– Fluctuations in voltage level

● Cheap power supplies will break on fluctuations

Notes about Power Consumption● Less power consumption

less heat ⇒ less defects(?)⇒

● Running costs per year can easily reach initial investment costs!– Do the math blade center could also pay off!⇒

● Do not switch on / off all nodes at once– Voltage peaks!

Decomposition of the Servers

Why Separate Login Nodes?● User interaction● May hang due to jobs ● Security

– Ssh ports open– May be hacked

● Configuration of user packages– System more on bleeding edge

Splitting Servers● Easily >10 overhead tasks● Why not in one big server?

– Security (one hole all broken)⇒– Stability– Maintenance

● Updates (what was done 3 years ago?)● Dependencies (how do software packages interfere?)● No plugin structure (no testing of different variants)

● Solution– Split the tasks >10 overhead servers⇒– Problem:

● Cost ● Hardware failures?

Combining Servers● Use XEN ● Host servers: 1...3 servers

– Hardware failure tolerant● Further advantages

– Extremely reduced costs– Complete rollback possible– Try different configurations

● Experiments are possible with limited budget– Clear separation of tasks

Administration

Administration Policies● Interaction with human beings

– Difficult social aspects– Good administrator is never realized (system works)

● Who has the root password?● Who will document what has been done?● Split the work, but communicate:

– Design decisions– Buying, writing grant proposals– Installation, bug fixing– Educating end-users

Administration Policies● User interaction

– Keep the users informed (mailing list)– Monitor system to cancel out problems before they

occur

Managing Different Groups● Impossible!● Each group has to provide at least one person

– Managing user education– Monitoring performance– Know the needs ( cluster design decisions)⇒

⇒ Sharing administrator not possible!● Sharing resources: possible & meaningful

What is the Critical Data?● What data has to be stored?

– User programs– Final data– May be put on RAID-Mirror

● What data can be exposed to potential loss?– Temporary files– May be put on RAID-Stripe

Compilation● Custom user programs / libraries● Where to install

– /usr/local/ (system-wide)– $HOME (per-user)

● Autotools provide possibility to install whole distribution in home-directory!

⇒ Depends on how often the code changes● Choosing a compiler

– GNU compilers are good & free– Special CPU instructions: buy a compiler

● Intel compiler● Portland compiler

Security● University networks are

– Insecure– Treasured victims

● Risks– Ssh password login– Open ports– Updating

● Keep up to date with serious bugs!– Users

● Therefore (attacks will happen on daily basis!)– Use firewall– Monitor system for odd behavior

Operating Systems

Which Operating System?● Different OS / distributions exist

– But widely compatible configuration– Way of doing stuff differs slightly in detail

● I.e. Directories / files– Watch out for licenses: BSD, GPL, ...

● OS: provide basic stable & secure functionality– Linux

● Debian● RedHat● SuSE (slow, costly, small community)

– FreeBSD (more secure, but ~older versions)– OpenBSD (most secure)

Updating or not?● Motivations

– Stability– Security– Features

● Possible solution: – Keep login servers and firewall up-to-date– Keep computation nodes stable (out-of-date)– Works only if nodes are in inner network

Rolling your own distribution● Possible solution for installation issues● Possibilities

– From scratch distribution– Modify existing distribution– Compiling only custom packages (/usr/local/bin)– Keep system hdd-images and clone them

Lesson Learned● Reproduceable?

– Making a distribution is exhausting● Documentation (wiki)

– Someday you have to handover– Or reinstall

● Keep a complete mirror – Packages may vanish

The Gentoo-Approach● Use source-packages● Autotools binary files⇒● Create special configuration files for dependencies

– In gentoo: portage (→ corvix: egatrop)– In bsd: ports

● Alternative– Linux from scratch

● Missing the configuration files ● Rely on autotools

– Arch linux● Websites are good sources for step-by-step

howtos

The Debian-Approach● Compile once, distribute binary packages● Create custom-packages with only one command● Advantage

– Extremely fast– Easier to maintain for big number of servers– Embedded devices use similar packages architecture

Our solution● Stable basis system:

– Debian overlays● Additional package source with custom packages

– Xen images of the installed debian-system ⇒ Even faster reinstallations

● Custom software– I.e. user demanded libraries– Compilation in ~

Other cluster distributions● Debian-Based / RedHat-Based exist

– I.e. RocksCluster, CentOS, PelicanHPC, Corvix, ...● Good source for howtos● Good as cheat-sheet● But

– HPC is inherently customized– Flexibility highest with customized installation– None of the distros solved a problem that we had

Thank you!

● Acknowledgements– Prof. Dr. rer. Nat. Herbert M. Urbassek,

TU Kaiserslautern, Germany

strategies in cluster-design · – i.e. rockscluster, centos, pelicanhpc, corvix, ... good source...

Documents