parallel job deployment and monitoring in a hierarchy of mobile agents
Post on 14-Jan-2016
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
5/25/2006 CSS Speaker Series 1
Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents
Munehiro FukudaComputing & Software Systems, University of Washington, Bothell
Funded by
5/25/2006 CSS Speaker Series 2
Outline
1. Introduction
2. Execution Model
3. System Design
4. Performance Evaluation
5. Related Work
6. Conclusions
5/25/2006 CSS Speaker Series 3
1. Introduction
Problems in Grid Computing Background of Mobile Agents Objective Project Overview
5/25/2006 CSS Speaker Series 4
Quiet Laboratories UW1-320 and UW1-302 at 3pm on a weekday
No more computing resources needed?
5/25/2006 CSS Speaker Series 5
Demands for Computing ResourcesIn Teaching I'm in the 320 lab testing my program, and for some reason,
whenever I attempt to use 15 hosts, it asks me for passwords for hosts 31 - 18 and then freezes and does nothing.
I noticed that uw1-320-20 is being bogged down by zombie processes that someone left going on it:
I went around looking on some of the other computers, and its all over the place: user A has almost 40 processes running on uw1-320-30 since April 23rd, user B has about 10 on host 29, and there’s a ton more on almost every host.
I got tired of manually running a bunch of ssh commands to run rmi on many different machines.
I have narrowed the problem down to three machines: 16, 20, and 30. First of all uw1-320-20 is dead. It drops all incoming ssh connections. The other two, uw1-320-16 and uw1-320-30, both have a mysterious problem that I don't know how to solve.
5/25/2006 CSS Speaker Series 6
Demands for Computing ResourcesIn Research http://setiathome.berkeley.edu/
http://boinc.bakerlab.org/rosetta/rah_about.php
These are an effective way to collect numerous computing resources from all over the world.
But, here is a question:
Why don’t they use idle machines on their campuses first?
5/25/2006 CSS Speaker Series 7
Grid-Computing Brokers Desktops
Buyers: a desktop user Sellers: hardware components Brokers: Windows, Linux
Clusters Buyers: multiple users (e.g., CSS434 students) Sellers: cluster computing nodes Brokers: PBS, LSF
Grid computing Buyers: Seti@home, Rosseta@home, etc. Brokers: Globus, Condor, Legion (Avaki), NetSolve, Ninf,
Entropia, etc. Okay, no need to implement any more?
5/25/2006 CSS Speaker Series 8
Problems in Grid Computing
Targeting large business models A central entry point A lot of installation work
http://www.globus.org/toolkit/docs/4.0/
Little system faults Too gigantic
5/25/2006 CSS Speaker Series 9
Our Target
Network
Targeting a group of computer users No central entry point
No central managers No programming model restrictions
Easy installation work Easy participation but necessity of fault tolerance
5/25/2006 CSS Speaker Series 10
Background of Mobile Agents
Internet
Centralmanger Server Server Server
FTPHTTP
RPC
Cycle CycleCycle
User
An execution model previously highlighted as a prospective infrastructure of distributed systems.
Static job deployment and result collection: No more than an alternative approach to centralized grid middleware implementation
Our goal: Let mobile agents do unique tasks in grid computing
5/25/2006 CSS Speaker Series 11
Objective Focus on a group of
independent computers Turned on and off independently
Not controlled by a scheduler such as PBS and LSF
Not managed by a central server
Let mobile agents do unique tasks in grid computing Runtime job migration:
Moving a program from a faulty/busy site to an active/idle site
Seeking for fault tolerance and better load balancing
Negotiation: Negotiating with other agents
about computing resources Seeking for better load balancing
Inherent parallelism: Deploying and monitoring jobs in
parallel Decentralized job management
5/25/2006 CSS Speaker Series 12
Project Overview
Funded by: NSF Middleware Initiative Sponsored by:University of Washington In Collaboration of: Ehime University In a Team of: UWB Undergraduates
5/25/2006 CSS Speaker Series 13
2. Execution Model
System Overview Execution Layer Programming Environment
5/25/2006 CSS Speaker Series 14
System Overview
FTPServer
UserA
UserB
UserB
snapshotsnapshot
snapshots snapshots
User program wrapper
SnapshotMethods
GridTCP
User program wrapper
SnapshotMethods
GridTCP
User program wrapper
SnapshotMethods
GridTCP
snapshot
User A’sProcess
User A’sProcess
User B’sProcess
TCPCommunication
Commander Agent
Commander Agent
Sentinel Agent
Sentinel Agent
Resource Agent
Sentinel Agent
Resource Agent
Bookkeeper Agent
BookkeeperAgent
ResultsResults
5/25/2006 CSS Speaker Series 15
Execution Layer
Operating systems
UWAgents mobile agent execution platform
Commander, resource, sentinel, and bookkeeper agents
User program wrapper
GridTcpJava socket
mpiJava-AmpiJava-S
mpiJava API
Java user applications
5/25/2006 CSS Speaker Series 16
MPI Java Programmingpublic class MyApplication { public GridIpEntry ipEntry[]; // used by the GridTcp socket library public int funcId; // used by the user program wrapper public GridTcp tcp; // the GridTcp error-recoverable socket public int nprocess; // #processors public int myRank; // processor id ( or mpi rank) public int func_0( String args[] ) { // constructor MPJ.Init( args, ipEntry, tcp ); // invoke mpiJava-A .....; // more statements to be inserted return 1; // calls func_1( ) } public int func_1( ) { // called from func_0 if ( MPJ.COMM_WORLD.Rank( ) == 0 ) MPJ.COMM_WORLD.Send( ... ); else MPJ.COMM_WORLD.Recv( ... ); .....; // more statements to be inserted return 2; // calls func_2( ) } public int func_2( ) { // called from func_2, the last function .....; // more statements to be inserted MPJ.finalize( ); // stops mpiJava-A return -2; // application terminated }}
5/25/2006 CSS Speaker Series 17
3. System Design Mobile Agents Job Coordination
Distribution Resource allocation and monitoring Resumption and migration
Programming Support Language preprocessing Communication check-pointing
Inter-Cluster Job Deployment (Current Research Topic) Over-gateway agent migration Over-gateway communication Job distribution
5/25/2006 CSS Speaker Series 18
id 0
Agent domain (time=3:31pm, 8/25/05 ip = perseus.uwb.edu name = fukuda)
id 0
UWInject: submits a new agent from shell.
Agent domain (time=3:30pm, 8/25/05 ip = medusa.uwb.edu name = fukuda)
UWAgents – Concept of Agent Domain
User
id 1 id 2 id 3
id 7id 6id 5id 4 id 11id 10id 9id 8 id 12
-m 4
id 1 id 2
-m 3
UWPlace
A user job
5/25/2006 CSS Speaker Series 19
Job DistributionUser
Commanderid 0
Sentinelid 2
rank 0
Bookkeeperid 3
rank 0
Resourceid 1eXist
Sentinelid 8
rank 1
Sentinelid 11
rank 4
Sentinelid 10
rank 3
Sentinelid 9
rank 2
Bookkeeper
id 12rank 1
Bookkeeper
id 15rank 4
Bookkeeper
id 14rank 3
Bookkeeper
id 13rank 2
Sentinelid 32
rank 5
Sentinelid 34
rank 7
Sentinelid 33
rank 6
Bookkeeper
id 48rank 5
Bookkeeper
id 50rank 7
Bookkeeper
id 49rank 6
Job Submission
XML QuerySpawn
id: agent idrank: MPI Rank
snapshot
snapshot
Sensorid 4
Sensorid 5
5/25/2006 CSS Speaker Series 20
Resource Allocation and Monitoring
Node 1Node 0 Node 2
User
Commanderid 0
Resourceid 1eXist
Job submission
An XML query
CPU ArchitectureOSMemoryDiskTotal nodesMultiplier
total nodes x multiplier
A list of available nodes
Spawn
Sentinelid 2
rank 0
Bookkeeperid 2
rank 0
Node5Node 4Node 3
Sentinelid 8
rank 1
Bookkeeperid 12
rank 5
Sentinelid 2
rank 0
Sentinelid 8
rank 1
Bookkeeperid 2
rank 0
Bookkeeperid 12
rank 5
Case 1:Total nodes = 2Multiplier = 1.5
Case 2:Total nodes = 2Multiplier = 3
Future use Future use Future use
Sensorid 4
Sensorid 5
Sensorid 16
Sensorid 18
Sensorid 17
Sensorid 19
Sensorid 20
Sensorid 22
Sensorid 21
Sensorid 23
ttcp
Performance data
ttcp
ttcp
Our ownXML DB
5/25/2006 CSS Speaker Series 21
Job Resumption by a Parent SentinelSentinel
id 2rank 0
Sentinelid 8
rank 1
Sentinelid 11
rank 4
Sentinelid 10
rank 3
Sentinelid 9
rank 2
Bookkeeperid 15
rank 4
(0) Send a new snapshot periodically
MPI connections
(2) Search for the latest snapshot
(1) Detect a ping error
Sentinelid 11
rank 4
New
(4) Send a new agent
(5) Restart a user program
(3) Retrieve the snapshot
5/25/2006 CSS Speaker Series 22
Job Resumption by a Child Sentinel
Commanderid 0
Sentinelid 2
rank 0
Bookkeeperid 3
rank 0
Sentinelid 8
rank 1
Bookkeeper
id 12rank 1
Resourceid 1
(1) No pings for 8 * 5 (= 40sec)
No pings for 12 * 5 (= 60sec)
(2) Search for the latest snapshot
(3) Search for the latest snapshot
(4) Retrieve the snapshot
NewSentinel
id 2rank 0
(5) Send a new agent
(7) Search for the latest snapshot
(8) Search for the latest snapshot
(9) Retrieve the snapshot
(11) Detect a ping error (13) Detect a ping error and follow the samechild resumption procedure as in p9.
Commanderid 0
(10) Send a new agent
(6) No pings for 2 * 5 (= 10sec)
(12) Restart a new resource agent from its beginning
Resourceid 1
New
5/25/2006 CSS Speaker Series 23
User Program Wrapper
statement_1;statement_2;statement_3;check_point( );statement_4;statement_5;statement_6;check_point( );statement_7;statement_8;statement_9;
check_point( )
;
int fid = 1;while( fid == -2) { switch( func_id ) { case 0: fid = func_0( ); case 1: fid = func_1( ); case 2: fid = func_2( ); }}check_point( ) { // save this object // including func_id // into a file}
func_0( ) { statement_1; statement_2; statement_3; return 1;}func_1( ) { statement_4; statement_5; statement_6; return 2;}func_2( ) { statement_7; statement_8; statement_9; return -2;}
User Program WrapperSource Code
Preprocessed
Cryptography
5/25/2006 CSS Speaker Series 24
Pre-proccesser and Drawback
No recursions Useless source line numbers indicated upon errors Still need of explicit snapshot points.
statement_1;statement_2;statement_3;check_point( );while (…) { statement_4; if (…) { statement_5; check_point( ); statement_6; } else statement_7; statement_8;}check_point( );
int func_0( ) { statement_1; statement_2; statement_3; return 1;}int func_1( ) { while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement_8; }}
int func_2( ) { statement_6; statement_8; while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement8; }}
Source Code Preprocessed Code
Before check_point( ) in if-clause
After check_point( ) in if-clause
Preprocessed
5/25/2006 CSS Speaker Series 25
GridTcp – Check-Pointed Connection
n1.uwb.edu
n3.uwb.edu
n2.uwb.edu
TCPuser
program
rank ip
1 n1.uwb.edu
2 n2.uwb.edu
outgoing
backup
incoming
User ProgramWrapper
Snapshotmaintenance
TCP
userprogram
n2.uwb.edu2
n1.uwb.edu1
iprank
incoming
ougoing
backup
User ProgramWrapper
n3.uwb.edu
userprogram
n3.uwb.edu2
n1.uwb.edu1
iprank
incoming
ougoing
backup
User ProgramWrapper
TCP
Outgoing packets saved in a backup queue All packets serialized in a backup file every check
pointing Upon a migration
Packets de-serialized from a backup file Backup packets restored in outgoing queue IP table updated
5/25/2006 CSS Speaker Series 26
Inter-Cluster Job DeploymentCurrent Research Topic
Over-gateway agent deployment Over-gateway TCP communication Over-gateway agent tree creatioin
10.0.0.3
medusa.uwb.edu
uw1-320-00.uwb.edu uw1-320-01.uwb.edu
10.0.0.4 10.0.0.7
Internet
Private domain
Commanderid 0
Sentinelid 2
Sentinelid 8
Sentinelid 9
How?
5/25/2006 CSS Speaker Series 27
mnode0
medusa.uwb.edu
uw1-320-00.uwb.edu uw1-320-01.uwb.edu
mnode1 mnode4
Internet
Private domain
id 0id 1
UWAgents – Over Gateway Migration
id 1 id 1
id 1
spawnChild( )hop( )
hop( )hop( )
talk( )
Parent and children keep track of a route to each other’s current position.
A daemon maintains where a gateway is.
5/25/2006 CSS Speaker Series 28
mnode0
medusa.uwb.edu
uw1-320-00.uwb.edu uw1-320-01.uwb.edu
mnode1 mnode4
Internet
Private domain
GridTcp – Over-Gateway Connection
Commanderid 0
Sentinelid 2
rank 0
Sentinelid 8
rank 1
Sentinelid 9
rank 2
userprogram
User Program Wrapper
-medusa2
-mnode01
medusauw1-320-000
gatewaydestrank
userprogram
User Program Wrapper
-medusa2
medusamnode01
-uw1-320-000
gatewaydestrank
userprogram
User Program Wrapper
-medusa2
-mnode01
-uw1-320-000
gatewaydestrank
5/25/2006 CSS Speaker Series 29
Partition 2
Over-Gateway Agent Tree CreationPossible Solutions
User
Commanderid 0
Sentinelid 2
rank 0
Sentinelid 8
rank 1
Sentinelid 11
rank 4
Sentinelid 10
rank 3
Sentinelid 9
rank 2
Sentinelid 32
rank 5
Sentinelid 34
rank 7
Sentinelid 33
rank 6
Sentinelid 35
rank 8
Sentinelid 46
rank 19
Sentinelid 47
rank 20
Bookkeeperid 3
rank 0
Resourceid 1
Cluster 0
Cluster 1
Cluster 2
Partition 1
5/25/2006 CSS Speaker Series 30
Sentinelid 531
rank 10
Sentinelid 131rank 4
Over-Gateway Agent Tree CreationFinal Solution
User
Commanderid 0
Sentinelid 2
Sentinelid 8
rank -8
Sentinelid 33
rank -33
Sentinelid 32
rank 0
Sentinelid 9
rank X
Sentinelid 130rank 3
Sentinelid 129rank 2
Sentinelid 132rank 6
Sentinelid 34
rank -34
Bookkeeperid 3
rank 0
Resourceid 1
Sentinelid 512rank 5
Sentinelid 530rank 9
Sentinelid 529rank 8
Sentinelid 35
rank -35
Sentinelid 39
rank X+4
Sentinelid 128rank 1
Sentinelid 38
rank X+3
Sentinelid 37
rank X+2
Sentinelid 36
rank X+1
Sentinelid 528rank 7
Cluster 0
Cluster 1
Cluster 2
Cluster 3
Cluster gateway 0
Cluster gateways 1, 2, and 3
Desktop computers
5/25/2006 CSS Speaker Series 31
4. Performance Evaluation
Evaluation Environment: A 8-node Myrinet-2000 cluster: 2.8GHz pentium4-Xeon w/ 512MB A 24-node Giga-Ethernet cluster: 3.4GHz Pentium4-Xeon
w/512MB
Computation Granularity Java Grande MPJ Benchmark Process Resumption Overhead File Transfer
5/25/2006 CSS Speaker Series 32
Computational Granularity 1Master-slave computation
0
1
10
100
1000
10,0
00/1
,000
10,0
00/1
0,00
0
10,0
00/1
00,0
00
20,0
00/1
,000
20,0
00/1
0,00
0
20,0
00/1
00,0
00
40,0
00/1
,000
40,0
00/1
0,00
0
40,0
00/1
00,0
00
Size (doubles) / # floating-point divides
Tim
e (sec)
1 CPU
8 CPUs
16 CPUs
24 CPUsMaster
Slave Slave Slave Slave Slave
Communication
Master-slave computation
5/25/2006 CSS Speaker Series 33
Computational Granularity 2Heartbeat
0
1
10
100
1000
10,0
00/1
,000
10,0
00/1
0,00
0
10,0
00/1
00,0
00
20,0
00/1
,000
20,0
00/1
0,00
0
20,0
00/1
00,0
00
40,0
00/1
,000
40,0
00/1
0,00
0
40,0
00/1
00,0
00
Size (doubles) / # floating-point divisions
Tim
e (sec)
1 CPU
8 CPUs
16 CPUs
24 CPUs
Process Process Process Process Process
Communication
Heartbeat communication
5/25/2006 CSS Speaker Series 34
Computational Granularity 3Broadcast
0
1
10
100
1000
10,0
00/1
,000
10,0
00/1
0,00
0
10,0
00/1
00,0
00
20,0
00/1
,000
20,0
00/1
0,00
0
20,0
00/1
00,0
00
40,0
00/1
,000
40,0
00/1
0,00
0
40,0
00/1
00,0
00
Size (doubles) / # floating-point divides
Tim
e (sec)
1 CPU
8 CPUs
16 CPUs
24 CPUs
Process Process Process Process Process
Communication
All to all broadcast
5/25/2006 CSS Speaker Series 35
Performance Evaluation - Series
0
50
100
150
200
250
300
350
1 4 8 12 16 24
# CPUs
Tim
e (sec)
Agent deployment
Disk operations
Snapshot
ApplicationMaster-slave computation
5/25/2006 CSS Speaker Series 36
Performance Evaluation - RayTracer
0
50
100
150
200
250
300
350
1 4 8 12 16 24
# CPUs
Tim
e (s
ec)
Agent deployment
Disk operations
Snapshot
Application
All reduce communication but few data to send
5/25/2006 CSS Speaker Series 37
Performance Evaluation – MolDyn
0
50
100
150
200
250
300
350
1 2 4 8
# CPUs
Tim
e (s
ec)
Agent deployment
Snapshot
Disk operations
GridTcp overhead
Java application
All to all broadcast
5/25/2006 CSS Speaker Series 38
Overhead of Job Resumption
5/25/2006 CSS Speaker Series 39
User
Commanderid 0
Sentinelid 2
rank 0
Sentinelid 8
rank 1
Sentinelid 11
rank 4
Sentinelid 10
rank 3
Sentinelid 9
rank 2
Sentinelid 32
rank 5
Sentinelid 34
rank 7
Sentinelid 33
rank 6
Sentinelid 35
rank 8
Sentinelid 46
rank 19
Sentinelid 47
rank 20
Bookkeeperid 3
rank 0
Resourceid 1
AgentTeamwork vs NFS Pipelined Transfer in AgentTeamwork
File Transfer
5/25/2006 CSS Speaker Series 40
5. Related Work
From the viewpoints of: System Architecture Fault Tolerance Job Deployment and Monitoring
5/25/2006 CSS Speaker Series 41
System Architecture
Systems Architectural basis
Globus A toolkit
Condor Process migration
Ninf, NetSolve RPC
Legion (Avaki) OO
Catalina, J-SEAL2, AgentTeamwork Mobile agents
Difference from Catalina/J-SEAL2 They are not fully implemented. They are based on a master-slave model
5/25/2006 CSS Speaker Series 42
Fault Tolerance
Systems Libraries Data recovery Communication recovery
Legion (Avaki) FT-MPI Variables passed to MPI_FT_save( )
Links recovered
Condor MW Library All master data Master-worker communication
Dome Dome_env Objects declared as dXXX <type>
N/A
AgentTeamwork GridTcp All serializable class data
All in-transit messages
5/25/2006 CSS Speaker Series 43
Job Deployment and Monitoring
Systems Co-Allocation Module
Deployment Scheme
Globus DUROC Master slave
Condor Grid Manager Mater slave
Legion Scheduler and Enactor
Master slave
AgenTeamwork Sentinel agents Hierarchical
5/25/2006 CSS Speaker Series 44
6. Conclusions
Project Summary Next Two Years
5/25/2006 CSS Speaker Series 45
Project summary
Applications Computation granularity: 40,000 doubles x 10,000 floating-point
operations Message transfer: Any types except all-to-all communication Entire application size: 3+ times larger than computation
granularity
Current status UWAgent: completed Agent behavioral design: basic job deployment/resumption
implemented User program wrapper: completed including security features GridTcp/mpiJava: in testing Preprocessor: almost completed
5/25/2006 CSS Speaker Series 46
Next Two Years
Application support Fault tolerance in file transfer GUI improvement
Agent algorithms Over-gateway application deployment Dynamic resource allocation and monitoring Priority-based agent migration
Performance evaluation Dissemination
5/25/2006 CSS Speaker Series 47
Can AgentTeamwork Become Their Competitor?
AgentTeamwork
Nimrod
5/25/2006 CSS Speaker Series 48
Questions?
5/25/2006 CSS Speaker Series 49
MPJ.Send and Recv Performance
5/25/2006 CSS Speaker Series 50
Mobile Agents
Mobile agents
Naming Cascading termination
Job scheduling
Security
IBM Aglets AgeltFinder traces all agents
Needs to retract one by one
Schedules jobs with Baglets.
Java byte-code verification
Voyager RPC-based system-unique agent IDs
Needs to be implemented at a user level
Launches an independent user process.
CORBA security service
D’Agent Unpredictable agent IDs
Needs to be implemented at a user level
Launches an independent user process.
A currency-based model
Ara
(Obsolete)
Unpredictable agent IDs
Calls ara_kill to kill all agents
Launches an independent user process.
An allowance model
UWAgent Agent domain Waits for all descendants’ termination
Schedules jobs with Java thread functions.
Agent-to-agent security w/ Agent domain
top related