CUG 2006
This Presentation May Contain Preliminary Information That Is Subject To Change
ALPSApplication Level Placement Scheduler
Michael [email protected]
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 2
ALPS Design Goals (1)• Scalability• Thousands of OS instances and applications
• Efficiency• Maximize resource utilization• Minimize overhead
• Predictibility• Consistent performance of applications• Guaranteed resource availability
• Adaptability• Mask architecture specific details• Exploit architecture specific capabilities
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 3
ALPS Design Goals (2)• Extensibility• Adaptable to future architectures• Simplified integration with workload management
systems• Maintainability• Reduce complexity• Separate policy and mechanism
• Availability• Recover quickly with minimal impact• Minimize single points of failure
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 4
ALPS Operating Environment• Hardware• Multiple node types• Multiple processor types• Processor and memory variations• Distributed shared memory
• Software• Multiple parallel programming paradigms• Multiple OS instances• Supported on Compute Node Linux only• Multiple workload managers• Administration and configuration tools• Resource and event monitoring
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 5
ALPS Core Services• Launch and cleanup applications• Binary executable distribution• Monitor and report application status• Application ID assignment• Resource reservation management• Signal propagation• Standard input, output, and error management• Resource availability monitoring• Provide external access to application processes
for debugging and performance analysis
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 6
ALPS Features: Gang Scheduling• ALPS manages context switching• Consistent across entire application• Configurable interval
• Allows short and long running jobs to coexist• Supports configurable CPU oversubscription factor• No support for memory oversubscription
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 7
ALPS Features: Reservations• Maintain resource availability for batch jobs• Support interactive users• Reservation states:• FILED - Request registered• CONFIRMED - Resources locked• CLAIMED - Resources in use
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 8
ALPS Features: BASIL• Batch & Application Scheduler Interface Layer• Extensible XML-RPC implementation• Open interface specification• No proprietary APIs or libraries• Third party vendors manage integration• Three primary functions:• Inventory• Reservation creation• Reservation cancellation
• BASIL programmer’s guide
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 9
ALPS Features: Fanout Tree
1082401699054681341315338254369585851541057273732173
331795321111113216842
Tree Radix
Tree
Dep
th
• Provides scalability• Supports parallel operation• Simulated broadcast on unicast network• Configurable radix:
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 10
ALPS Components• Clients• aprun – Application submission• apstat – Application status• apkill – Signal delivery• apbasil – Workload manager interface
• Servers• apsys – Client interaction on login nodes• apinit – Process management on compute nodes• apsched – Reservations and placement• apbridge – System data collection• apwatch – Event monitoring
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 11
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 12
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 13
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 14
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 15
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 16
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 17
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 18
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 19
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 20
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 21
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 22
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 23
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 24
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 25
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 26
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 27
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 28
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 29
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 30
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 31
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 32
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 33
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 34
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 35
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 36
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 37
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 38
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 39
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 40
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 41
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 42
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 43
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 44
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 45
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 46
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 47
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 48
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 49
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 50
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 51
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 52
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 53
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 54
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 55
apsched(Service or
Login Node)
aprun(PEs 0,1,2)
Login Node A
apinit
apsheperd
PE 1
apinit
apsheperd
PE 0
apinit
apsheperd
PE 2
Compute Node
fork
fork
forkLocalapsys
appagent
stdin handler
apkill
Login Node B Localapsys
appagent fork
apstataprun
signal
Shared Files
fork
fork
aprun
Login Node C
Localapsys
appagent
stdin handler
fork
fork
apbasil
LoginShell
WLM fork,exec
fork,exec
apbridgeapwatchevent router(L1,L0 - SMW)
SystemDatabase
(SDB Node)
privateport
Service Nodepipe
fork, exec
fork, exec
fork, exec
To a ComputeNode
Compute Node
Compute Node
stdin
control socket connection – includes stdout & stderr
qsub
5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 56
Q & A• Questions?• Thanks to the ALPS team:• Richard Lagerstrom - development• Marlys Kohnke - development• Carl Albing – development• Bob Gross - testing• Jan Gustafson - our current manager• Wayne Margotto - our former manager
• Thank You!