clustering technology overview sungho kim, ph.d. president kesper inc. clustering with linux
TRANSCRIPT
Clustering Technology Clustering Technology OverviewOverview
Sungho Kim , Ph.D.
President
KESPER Inc.
Clustering with LinuxClustering with Linux
AgendaAgenda
Linux Overview
Linux Kernel Features
Linux Network Protocols Overview
Linux Clustering Overview
Network Protocols for Clustering
HPC Cluster
Internet Cluster
HA Cluster
Conclusions
Multi-user Multitasking Unix-like OS
Multi-architecture, multi-platform OS
Freely distributable open source OS : GNU software
IEEE POSIX compliance
Wide range of peripherals supports
Wide configurability : From embedded to supercomputer
X Window Support
Full Network awareness- Various Network Protocol Support- TCP/IP, IPX/SPX, Appletalk, Samba, NFS, Web, Mail, etc
32/64 bits Full supports
Linux OverviewLinux Overview
Systems
IBM PC and compatiblesApple Macintosh :
from m68000 to powerpcSUNSGIAtari/AmigaCompaq alphaNetwinder
CPU
Intel x86, AMD, Cyrix Alpha EV5, EV6 (64-bit)PowerPC Sparc, UltraSparc(64-bit)M68kStrong/ARMMIPS
Linux HardwareLinux Hardware
Network Interface Cards
10/100/1000 MB/sMyrinet ATMToken Ring / FDDI / HIPPIARCnetISDNX.25Frame RelayFibre ChannelWAN
Network Applications
Web Server : Apache , NetscapeDHCP Server : dhcpdFTP Server : proftpd ( ftp, ncftp )Mail Server : sendmail / pine, mutt, elm
pop3 / imap / procmailmailing list : majordomo
Chatting Server : irc ( bitchx, irc )File Server : Samba News Server : innd ( tin, pine, trn )DNS Server : bindNIS Server : NIS
Linux & Linux & NetworkNetwork
Kernel Options of 2.2.x
Code maturity level options
Processor type and features
Loadable module supportGeneral SetupPlug and Play supportBlock devicesNetworking optionsSCSI optionsSCSI low-level driversNetwork device supportAmateur Radio supportISDN subsystemCD-ROM drivers
Character devicesMiceVideo for LinuxJoystick supportFtape, the floppy tape device driverFile systemsNetwork File SystemsPartition typesConsole driversSoundKernel Hacking
Kernel FeaturesKernel Features
Status of Kernel 2.4-test
USB supportsLogical Volume ManagerExt3 Journaling File SystemsIrDA driver updatesGas using instead of as86Athlon supportsQuickCAM supportXFree86 DRI (Direct Rendering Interface)Kernel HTTPD supportsDirect decompressing from Flash or ROMI2O driver updatesDVD filesystem (udf) supports
Specific Specific FeaturesFeatures
Supported Kernel Network Features
TCP/IP ProtocolIPXMulticasting ( MBONE )Tunneling ( GRE / Mobile-IP )VPNAdvanced RouterWAN Router ( WAN Card + Linux )Frame Relay / X.25 / leased lineHIPPI ( Cluster and Supercomputer )Token RingIP Masquerading ( NAT )IP Alias ( Virtual IP / Virtual domain )Bridging ( Bridging / Load Balancing )ISDN / xDSL
Network Network ProtocolsProtocols
Other Network Protocols
EQL ( Serial Line load balancing )SLIP ( Serial Line Interface Protocol )CSLIP (Compressed Serial Line Interface Protocol )PPP ( Point-to-Point Protocol )PLIP ( Parallel Line Interface Protocol )
X.25 : PLP ( Packet Layer Protocol )HIPPI ( High Performance Parallel Interface )FDDI ( Fiber Distributed Data Interface )
IPv6 ( IPng ) : Experimental ARCnetSNMP
Linux NetworkingLinux NetworkingNetwork Network
ProtocolsProtocols
Category of Cluster Systems
Categories depend on their configuration method and applied areas
HPC Cluster Computation-intensiveBulk Storage Cluster Stored Data sharing and serviceWeb/Internet Cluster Network load distribution and LBHA Cluster Increase the Availability of systems
Components : Network + OS + Storage + API
HPC : High Performance ComputingHA : High AvailabilityLB : Load Balancer
Cluster System OverviewCluster System Overview
IP Tunneling
Encapsulating data of protocolVPN ( Virtual Private Network )GRE tunneling – Generic Routing
Encapsulation– CISCO RouterMobile-IP for laptop
IP Firewalls/Masquerading
NAT ( Network Address Translation )
Modified firewallIP auto forwardIP port forward
Filtering
IP Packet filteringLinux Socket filtering
(BSD socket filtering)Unix domain socket filtering
( X-windows, syslog )Firewall packet filer/IP masqueradin
g
IP : kernel level autoconfig
Network booting X terminal TFTP / BOOTP / RARP
Linux Cluster NetworkLinux Cluster Network
Clustering Technology
A Bunch of computers to execute some jobs in parallel with multiple computers and pre-configured networks
Beowulf : Linux based Cluster
Characteristics of Clusters
High Availability and expandabilityHigh Performance/price
Personal Personal SupercomputerSupercomputer
Linux HPC Linux HPC ClusterCluster
Components of Clusters
Hardware CPU : Intel Pentium, Digital Alpha, Mac G3Network : Ethernet, Myrinet, ATM, Gigabit EthernetStorage : Fibre Channel/SCSI RAID
Software Operating System & Compiler : Linux, Windows NT, DEC OSFCommunication Library : PVM, MPIAdministration Tool : CMSQueuing Software : DQS, PBSApplication Libraries : BLACS, ATLAS, ScaLapack, PB
LAS
Linux HPC Linux HPC ClusterCluster
AVALON - Los Alamos National Lab.Configuration of Hardware systems
533MHz Alpha 21164A microprocessorDEC AlphaPC 164LX motherboard ECC SDRAM DIMMs (256 Mbytes total per node) Quantum Fireball ST3.2A 3079Mb EIDE U-ATA drive Kingston ethernet card with a DEC Tulip chipset
Node Configuration (140)
Network Configuration
3Com SuperStack II 3900 36-port fast ethernet switches
Cyclades multiport serial switches Cost : about $300 per port.
3Com SuperStack II 9300 12-port Gigabit
Ethernet switch
switched network of 144 fast ethernet ports++ ==44xx
Linux HPC Linux HPC ClusterCluster
Node 0
Node 35Node 0
Node 70
Node 0
Node105Node 0
Node 140
3900
3900 3900
3900
3COM 9300 1G eth.
Linux HPC Linux HPC ClusterCluster
Software Configurations
OS : RedHat Linux 5.0, kernel 2.1.125MPICH and own basic set of MPI routinesCompiler : egcs 1.1b
Application ProgramsSPaSMGravitational tree code
Linux HPC Linux HPC ClusterCluster
Performance (113/500) 70 nodes 140 nodes
Linpack benchmark 19.7 GFlops 47.7 GFlopsSPaSM 12.8 GFlops 29.6 GFlop
sGravitational treecode 10.0 GFlops -
Price vs PerformancePrice of Avalon : $313,000Avalon’s Performance = 64 CPUs 195 Mhz SGI Origin 2000
(SPaSM, Tree code, and Linpack)Price of 64 CPUs SGI Origin 2000 = over 100 M$
Linux HPC Linux HPC ClusterCluster
Simple Network Connection Nodes have Internet IP Addresses
LAN/WAN
Server 1 Server n
Internet
Intranet
DSU/Router
Cluster Server Farm
Network Network ConfigurationConfiguration
Double Network Connection Nodes have Internet IP Addresses and Local IP Addresses
LAN/WAN
Server 1 Server n
Internet
Intranet
DSU/Router
Cluster Server Farm
Second-layer Network
Network Network ConfigurationConfiguration
Double Network Connection + Master-Slave(NAT) configurationNodes have local IP Addresses
LAN/WAN
Slave Server 1 Slave Server n
Internet
Intranet
DSU/Router
Cluster Server Farm
Second-layer Network
Master Server
Network Network ConfigurationConfiguration
Second-layer Network connection with Cross-bar connection on 32 Node Cluster
16
Nodes
16
Nodes
32 Host Bus Adapters12 Switches64 Cables
Crossbar Inter-connectionCrossbar Inter-connection
Node 1 (IO controller)
Node 2 (IO controller)
Node 3 (IO controller)
Node 4 (IO controller)
Node 5 (IO controller)
Node 6 (IO controller)
MonitorKeyboard
Mouse
RAID Controller(0)
RAID Controller(1)
SCSI / FC
Keyboard-Video-Mouse and Disk IO connection
Master (IO controller)
Console Splitter
Switcher
Node 0 (IO controller)
I/O ConnectionI/O Connection
Virtual Internet Cluster Server
Scalable and highly available server built on a cluster of real serversThe architecture of the cluster is transparent to end users and the users see
only a single virtual server.
Methods to build Virtual Internet Cluster Server
Virtual Server via NATVirtual Server via IP tunnelingVirtual Server via IP filteringVirtual Server via Direct Routing
Ref : www.linux-vs.org
Internet ClusterInternet Cluster
Internet Cluster
Internet ClusterInternet Cluster
Virtual Internet Cluster Server via NAT
This is done by network address port translation.The code is implemented on Linux IP Masquerading codes and port forwardin
g code are reused.Refer ipfwadmipfwadm command.All the process are figured out.
Internet ClusterInternet Cluster
User
Load BalancerLinux Box
Internet
DSU/Router
LAN/WAN
Real Server 1
Intranet
Real Server n
Virtual Cluster Server via NAT
L4 Switch
Internet ClusterInternet Cluster
User
Load BalancerLinux Box
DSU/Router
LAN/WAN
Real Server n
Virtual Cluster Server via NAT
How This Cluster Works ?
(1) requests
(4) rewriting replies
(3) ProcessingThe requests
(2) Scheduling & rewriting packets
(5) replies
Internet ClusterInternet Cluster
Virtual Internet Cluster Server via IP Tunneling
IP Tunneling (IP encapsulation) is a technique to encapsulate IP datagram within IP datagrams, which allows datagrams destined for on IP address to be wrapped and redirected to another IP address.
IP encapsulating is now commonly used in Extranet, Mobile-IP, IP-Multicast, tunneled host or network.
The load balancer encapsulates the packetand forwarded to the server.
When the server receives the encapsulatedpacket, it decapsulates the packet andprocesses the request, finally return the result directly to the user.
Refer NET-3-HOWTO NET-3-HOWTO command.
Internet ClusterInternet Cluster
User
Load BalancerLinux Box
Internet
DSU/Router
LAN/WAN
Real Server 1 Real Server n
Virtual Cluster Server via IP Tunneling
IP Tunnel
IP Tunnel(1) requests
Replie
s goin
g t
o t
he u
ser
dir
ect
ly
Virtual IP address is assigned
Internet ClusterInternet Cluster
User
Load BalancerLinux Box
Internet
DSU/Router
LAN/WAN
Real Server 1 Real Server n
Virtual Cluster Server via IP Tunneling
(1) requests
Virtual IP address is assigned
(2) encapsulation
(3) de-encapsulation & reply to user
Internet ClusterInternet Cluster
Fibre RAID Storage
Internet DSU/Router
half-duplex :100MBytes/sec
full-duplex : 200MBytes/sec
LAN/WAN
Network is configured with one of the virtual cluster server techniques.The disk storage is connected with Fibre Channel including SAN file systems.
FC Switch
Linux Storage Cluster with GFS or SAN
Fibre Channel
Storage ClusterStorage Cluster
SAN(Storage Area Network)–Scalability 125 disks w/ one controller
–Easiness of Management
–Fast Disk I/O Speed 100 Mbytes/sec ( half-duplex ), 200Mbytes/sec (full-duplex)
–Long Distance over 10 km (fiber-optical cable)
Fibre RAID Storage
Fibre Channel half-duplex
:100MBytes/secfull-duplex :
200MBytes/sec
Fibre Channel Switch
Linux Storage Cluster with GFS or SAN
Storage ClusterStorage Cluster
Linux Supporting File Systems
– ext2/ext3 file systems– ISO 9660 (CD-FS) – VFAT / FAT– SMB (CIFS)– UFS– NTFS– UDF ( DVD-FS )– NFS / CODA – LVM ( Logical Volume Manager )– GFS ( Global File Systems )– Reiser FS ( Journaling File Systems ), SGI XFS, IBM JFS– RIO ( Raw IO )– RAMFS– ROMFS
Storage ClusterStorage Cluster
Feature Overview about GFS
The Global File System (GFS) allows multiple Linux machine to share storage devices over a network. Each machine sees the network disks as local, and GFS itself appears as a local file system. Writes to a file by one Linux machine are seen by another machine that later reads
that file.
GFS Storage ClusterGFS Storage Cluster
Normal ConfigurationNormal Configuration
GFS Cluster ConfigurationGFS Cluster Configuration
Complex ConfigurationComplex Configuration
Cross-bar FC connection
GFS Cluster ConfigurationGFS Cluster Configuration
Hybrid ConfigurationHybrid Configuration
GFS Configuration
NFS Configuration
GFS Cluster ConfigurationGFS Cluster Configuration
GFS Cluster PerformanceGFS Cluster Performance
Enterprise Server Requirements
Non-StopFault-Tolerant
ClusterHA
ServerStand-alone
Reliability
Availab
ility
Serv
iceabilit
y
High Availability High Availability ClusterCluster
Server Downtime
Un-Planned DowntimeHardware FaultSoftware Fault
Planned DowntimeHardware exchangeHardware UpgradeO/S upgradeSoftware upgrade
Cost due to Downtime
Jobs Cost per hour Stock Exchange
Credit Card TV Shopping Sell Products
Air-ticket reservation ATM Fee
5.6 ~ 7.3M$ 2.2 ~ 3.1M$ 87 ~ 140K$ 60 ~ 120K$ 67 ~ 112K$ 12 ~ 17K$
High Availability High Availability ClusterCluster
Comparison of Availability
Architecture Max. availability DownTime/failure Downtime/year(
minutes)
Continuous Processing
100.00% None 0
Fault Tolerent 99,999% Cycles 0.5 ~ 5
Clusters 99.9 ~ 99,999% Seconds to minutes
5 ~ 500
High Availability 99.9% Minutes 500 ~ 10,000 (Disk Mirroring)
Server 99.5% Houre 1,000 ~ 10,000 (Disk Mirroring)
Stand-Alone 99% Houre 2,600 ~ 10,000 (Without Disk
Mirroring)
High Availability High Availability ClusterCluster
Heartbeat
Concept of HA system Dual Network for Response
Shared Storage
Active or Standby Systems
Dual IO connection for Storage
High Availability High Availability ClusterCluster
Lines of heartbeat
Heartbeat
Dual Network for Response
Shared Storage
Active or Standby Systems
Dual IO connection for Storage
Serial ConnectionTCP/IP over LANShared SCSI
Components of HA
Redundant Systems All connectable lines Shared Disks
FilesystemManagement software including Heartbeat checking daemon
Ref : www.linux-ha.org
High Availability High Availability ClusterCluster
Linux Internet Cluster Products
Wyz ClusterDR ClusterMission Critical LinuxRed Hat : piranha, High Availability ServerTurbo Linux : Turbo Cluster ServerVA Linux : VACMLegato ClusterVeritas…etc…
Concluding RemarksConcluding Remarks
Linux Clustering is a starting point that Linux can enter the enterprise market.
Until now, however, the clustering technology is one of major considerations of technical development group like institutes or academies.
Why Linux Cluster?
Cost Effective and Easy configurabilityFast technical development with open sourceMany references in various fields
Future Needs
New network configurability and TCP/IP stack performance.High-Availability for enterprise marketsCluster filesystem and disk I/O performanceHigh performance peripheral driversStable management and scheduler
Concluding RemarksConcluding Remarks
Concluding RemarksConcluding Remarks
Do Not Myth !
Clustering technology is matured enough ?
Easiness and stability are acquired ?
The clustering is a big market ? If, any field ?
Linux is in enterprise market ? If not, backend system ?
Linux vendor can maintain their advantages ?
Thank You !!!
KESPER Inc.
RM 803 DongA Officetel BongMyeong-Dong YuSeong-Gu Taejeon 305-709Republic of Korea (South Korea)
Tel. 82-42-828-7458 Fax 82-42-828-7455
Sungho Kim, President/CEO