performance evaluation of load sharing policies with pants on a beowulf cluster
DESCRIPTION
Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster. James Nichols Mark Claypool Worcester Polytechnic Institute Department of Computer Science Worcester, MA http://www.cs.wpi.edu/~jnick http://perform.wpi.edu. Introduction. What is a Beowulf cluster? - PowerPoint PPT PresentationTRANSCRIPT
1
Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster
James NicholsMark Claypool
Worcester Polytechnic InstituteDepartment of Computer Science
Worcester, MA
http://www.cs.wpi.edu/~jnickhttp://perform.wpi.edu
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
2
Introduction
What is a Beowulf cluster? Cluster of inexpensive personal computers networked
together via Ethernet Typically run the Linux operating system
Load Sharing Share load, decreasing response times and increasing
overall throughput Need for expertise in a particular load distribution
mechanism such as PVM or MPI
3
Introduction
Load Measurement Typically use CPU as the load metric. What about disk and memory load? Or
system events like interrupts and context switches?
PANTS Application Node Transparency System Removes the need for knowledge about a
particular implementation required by some load distribution mechanisms
4
Contributions
Propose new load metrics
Design benchmarks
Evaluate performance
There is some benefit to incorporating new types of load metrics into load distributions systems, like PANTS
5
Outline
Introduction PANTS Methodology Results Conclusions
6
PANTS
PANTS Application Node Transparency System Intercepts exec() system calls By default uses /proc file system to calculate CPU
load to classify node as “busy” or “free”
Any workload which does not generate CPU load will not be distributed
New load metrics and polices! Early results showed near linear speedup for
computationally intensive applications
7
PANTS Algorithm
We have implemented a variation of the multi-leader load-balancing algorithm proposed in [FW95]
A node is elected to be the leader Leader keeps track of which machines
in the cluster are free
8
PANTS Algorithm
A random free node is chosen by the leader and returned to a node upon request
Some algorithms use broadcast messages to find free nodes and communicate
Busy nodes need to receive and process all of the messages
PANTS avoids “busy-machine messages” by sending messages only to the leader multicast address
9
PANTS Multicast Communication
10
PANTS Implementation
Two major software components: PANTS daemon and prex (PANTS remote execute)
C-library object intercepts execve for processing by prex
prex queries the PANTS daemon for a node to execute the process on, the daemon handles load measurement and leader communication
RSH is used by prex to execute proccess on remote nodes
11
PREX
12
Extensions to PANTS
PANTS is compatible with the distributed interprocess communications package DIPC
DIPC requires some code modifications, but provides a wide range of IPC primitives
Modify PANTS daemon configuration by altering /etc/pantsd.conf
Send Unix signals to use new configuration Easily modify thresholds, exponential weighted
averaging settings, multicast addresses Wide range of logging options
New load metrics and policies
13
WPI’s Beowulf Cluster
Made possible through equipment grants from Compaq and Alpha Processor, Inc
Seven 600mhz Alpha machines (EV56) Physical memory from 64-512MB 128 MB swap space PCI Ultra-Wide SCSI hard drives 100BaseT Ethernet RedHat 7.1, Linux kernel 2.4.18 Shares files over NFS
14
Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster
Introduction PANTS Methodology Results Conclusions
15
Methodology
Identified load parameters Implemented ways to measure
parametersBuilt micro benchmarks which
stressed each load metric for testing and verification
Selected real world benchmark to evaluate performance
16
Load Metrics
CPU usage (%)
I/O blocks/sec
Context switches switches/sec
Memory page operations/sec
Interrupts Interrupts/sec
Read from /proc/stat
17
Micro-Benchmarks
Set of simple benchmarks designed to generate a certain type of workload
Verification of our load metrics Determination of thresholds
CPU: perform many FLOPS I/O: copy large directory and files Memory: malloc() a block of memory, copy
data structures using mmap()
18
Application benchmark: Linux kernel compile
Distributed compilation of the Linux kernel
Executed by the standard GNU program make
Loads I/O and memory resources
19
Application Benchmark: Details
Linux kernel version 2.4.18 432 files compiled Mean source file size: 19 KB
Marked gcc compiler binaries migrateable Needed to expand relative paths into
absolute paths
20
Application Benchmark
CPU Usage - PANTS Default
0
10
20
30
40
50
60
70
80
90
100
0 45 90 135 180 225 270 315 360 405 450 495 540 585 630 675 720 765 810 855 900 945 990103510801125117012151260130513501395144014851530
Time (Sec)
CPU Usage %
21
Application Benchmark
Disk Usage - PANTS Default
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
500000
0 40 80 120160200240280320360400440480520560600640680720760800840880920960100010401080112011601200124012801320136014001440148015201560Time (Sec)
Disk Blocks/Sec
22
Thresholds
Obtained idle measurements Iteratively established thresholds
Metric Idle Threshold
CPU (%) 0% 95%
I/O (blocks/sec) 250 1,000
Context switches (switches/sec) 950 6,000
Memory (pages/sec) 0 4,000
Interrupts (interrupts/sec) 103K 115K
23
Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster
Introduction PANTS Methodology Results Conclusions
24
Micro-benchmarks Results
Default Load Metrics
I/O micro benchmark - average load
0
20
40
60
80
100
120
CPU (%) I/O (100's Disk blocks/sec) Memory (100's page faults/sec) Interrupts (1000'sinterrupts/sec)
Context Switches (100's ofswitches/sec)
25
Micro-benchmarks Results
New Load Metrics
I/O micro benchmark - average load
0
100
200
300
400
500
600
700
CPU (%) I/O (100's Disk blocks/sec) Memory (100's page faults/sec) Interrupts (1000'sinterrupts/sec)
Context Switches (100's ofswitches/sec)
26
Application Benchmark Results
AverageStd Dev
MaxMin
All Metrics
CPU metric0
10
20
30
40
50
60
70
CPU %
27
I/O Load
AverageStd Dev
MaxMin
All Metrics
CPU metric0
50000
100000
150000
200000
250000
Disk B/s
28
Results: Compile Time
Summary Results - Distributed Compilation
0
200
400
600
800
1000
1200
1400
1600
Local NFS PANTS - nomigration
CPU Metric Only All load metrics
Compilation Method
Time(Sec)
29
Conclusions
PANTS has several attractive features: Transparency Reduced busy node communication Fault tolerance Intelligent load distribution decisions
Achieve better throughput and more balanced load distribution when metrics include I/O, memory, interrupts, and context switches.
30
Future Work
Use preemptive migration? Include network usage load metric Adaptive thresholds Heuristic based load distribution Migrate certain types of jobs to nodes that
perform well when processing certain types of workloads
31
Questions?
Acknowledgements
Jeffrey Moyer, Kevin Dickson, Chuck Homic, Bryan Villamin, Michael Szelag, David Terry, Jennifer Waite, Seth Chandler, David Finkel, Alpha Processor, Inc. and Compaq Computer Corporation
32
Performance Evaluation of Load Sharing Policies with PANTS on a Beowulf Cluster
James NicholsMark Claypool
Worcester Polytechnic InstituteDepartment of Computer Science
Worcester, MA
http://www.cs.wpi.edu/~jnickhttp://perform.wpi.edu
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.