alps - cray user group · alps components •clients •aprun – application submission •apstat...

Post on 02-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CUG 2006

This Presentation May Contain Preliminary Information That Is Subject To Change

ALPSApplication Level Placement Scheduler

Michael Karomek@cray.com

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 2

ALPS Design Goals (1)• Scalability• Thousands of OS instances and applications

• Efficiency• Maximize resource utilization• Minimize overhead

• Predictibility• Consistent performance of applications• Guaranteed resource availability

• Adaptability• Mask architecture specific details• Exploit architecture specific capabilities

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 3

ALPS Design Goals (2)• Extensibility• Adaptable to future architectures• Simplified integration with workload management

systems• Maintainability• Reduce complexity• Separate policy and mechanism

• Availability• Recover quickly with minimal impact• Minimize single points of failure

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 4

ALPS Operating Environment• Hardware• Multiple node types• Multiple processor types• Processor and memory variations• Distributed shared memory

• Software• Multiple parallel programming paradigms• Multiple OS instances• Supported on Compute Node Linux only• Multiple workload managers• Administration and configuration tools• Resource and event monitoring

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 5

ALPS Core Services• Launch and cleanup applications• Binary executable distribution• Monitor and report application status• Application ID assignment• Resource reservation management• Signal propagation• Standard input, output, and error management• Resource availability monitoring• Provide external access to application processes

for debugging and performance analysis

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 6

ALPS Features: Gang Scheduling• ALPS manages context switching• Consistent across entire application• Configurable interval

• Allows short and long running jobs to coexist• Supports configurable CPU oversubscription factor• No support for memory oversubscription

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 7

ALPS Features: Reservations• Maintain resource availability for batch jobs• Support interactive users• Reservation states:• FILED - Request registered• CONFIRMED - Resources locked• CLAIMED - Resources in use

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 8

ALPS Features: BASIL• Batch & Application Scheduler Interface Layer• Extensible XML-RPC implementation• Open interface specification• No proprietary APIs or libraries• Third party vendors manage integration• Three primary functions:• Inventory• Reservation creation• Reservation cancellation

• BASIL programmer’s guide

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 9

ALPS Features: Fanout Tree

1082401699054681341315338254369585851541057273732173

331795321111113216842

Tree Radix

Tree

Dep

th

• Provides scalability• Supports parallel operation• Simulated broadcast on unicast network• Configurable radix:

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 10

ALPS Components• Clients• aprun – Application submission• apstat – Application status• apkill – Signal delivery• apbasil – Workload manager interface

• Servers• apsys – Client interaction on login nodes• apinit – Process management on compute nodes• apsched – Reservations and placement• apbridge – System data collection• apwatch – Event monitoring

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 11

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 12

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 13

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 14

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 15

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 16

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 17

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 18

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 19

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 20

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 21

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 22

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 23

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 24

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 25

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 26

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 27

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 28

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 29

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 30

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 31

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 32

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 33

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 34

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 35

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 36

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 37

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 38

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 39

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 40

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 41

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 42

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 43

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 44

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 45

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 46

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 47

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 48

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 49

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 50

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 51

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 52

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 53

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 54

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 55

apsched(Service or

Login Node)

aprun(PEs 0,1,2)

Login Node A

apinit

apsheperd

PE 1

apinit

apsheperd

PE 0

apinit

apsheperd

PE 2

Compute Node

fork

fork

forkLocalapsys

appagent

stdin handler

apkill

Login Node B Localapsys

appagent fork

apstataprun

signal

Shared Files

fork

fork

aprun

Login Node C

Localapsys

appagent

stdin handler

fork

fork

apbasil

LoginShell

WLM fork,exec

fork,exec

apbridgeapwatchevent router(L1,L0 - SMW)

SystemDatabase

(SDB Node)

privateport

Service Nodepipe

fork, exec

fork, exec

fork, exec

To a ComputeNode

Compute Node

Compute Node

stdin

control socket connection – includes stdout & stderr

qsub

5/5/06 This Presentation May Contain Preliminary Information That Is Subject To Change 56

Q & A• Questions?• Thanks to the ALPS team:• Richard Lagerstrom - development• Marlys Kohnke - development• Carl Albing – development• Bob Gross - testing• Jan Gustafson - our current manager• Wayne Margotto - our former manager

• Thank You!

top related