stork an introduction condor week 2006 milan
Post on 07-Jan-2016
39 Views
Preview:
DESCRIPTION
TRANSCRIPT
Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor
Stork An Introduction
Condor Week 2006Milan
2http://www.cs.wisc.edu/condor
Two Main Ideas
•Make data transfers a “first class citizen” in Condor
•Reuse items in the Condor toolbox
3http://www.cs.wisc.edu/condor
The tools
•ClassAds
•Matchmaking
•DAGMan
4http://www.cs.wisc.edu/condor
The data transfer
problem•Process large data sets at sites on
grid. For each data set:
o stage in data from remote server
o run CPU data processing job
o stage out data to remote server
5http://www.cs.wisc.edu/condor
Simple Data Transfer
Job#!/bin/sh
globus-url-copy source dest
Often works fine for short, simple
data transfers, but…
6http://www.cs.wisc.edu/condor
What can go wrong?
•Too many transfers at one time
•Service down; need to try later
•Service down; need to try alternate data source
•Partial transfers
•Time out; not worth waiting anymore
7http://www.cs.wisc.edu/condor
Stork
•What Schedd is to CPU jobs, Stork is to data placement jobs.o Job queue
o Flow control
o Failure-handling policies
o Event log
8http://www.cs.wisc.edu/condor
Supported Data
Transfers•local file system
•GridFTP
•FTP
•HTTP
•SRB
• NeST
• SRM
• other protocols via simple plugin
9http://www.cs.wisc.edu/condor
Stork Commands
stork_submit - submit a jobstork_q - list the job queuestork_status - show completion
statusstork_rm - cancel a job
10http://www.cs.wisc.edu/condor
Creating a Submit Description File
• A plain ASCII text file
• Tells Stork about your job:o source/destinationo alternate protocolso proxy locationo debugging logso command-line arguments
11http://www.cs.wisc.edu/condor
Simple Submit File// c++ style comment lines[ dap_type = "transfer"; src_url = "gsiftp://server/path”; dest_url = "file:///dir/file"; x509proxy = "default"; log = "stage-in.out.log"; output = "stage-in.out.out"; err = "stage-in.out.err";]
Note: different format from Condor submit files
12http://www.cs.wisc.edu/condor
Sample stork_submit
# stork_submit stage-in.storkusing default proxy: /tmp/x509up_u19100================Sending request: [ dest_url = "file:///dir/file"; src_url = "gsiftp://server/path"; err = "path/stage-in.out.err"; output = "path/stage-in.out.out"; dap_type = "transfer"; log = "path/stage-in.out.log"; x509proxy = "default" ]================
Request assigned id: 1#
returned job id
13http://www.cs.wisc.edu/condor
Sample Stork User Log
000 (001.-01.-01) 04/17 19:30:00 Job submitted from host: <128.105.121.53:54027>...001 (001.-01.-01) 04/17 19:30:01 Job executing on host: <128.105.121.53:9621>...008 (001.-01.-01) 04/17 19:30:01 job type: transfer...008 (001.-01.-01) 04/17 19:30:01 src_url: gsiftp://server/path...008 (001.-01.-01) 04/17 19:30:01 dest_url: file:///dir/file...005 (001.-01.-01) 04/17 19:30:02 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job 0 - Total Bytes Sent By Job 0 - Total Bytes Received By Job...
14http://www.cs.wisc.edu/condor
Who needs Stork?
SRM exists. It provides a job queue, logging, etc.
Why not use that?
15http://www.cs.wisc.edu/condor
Use whatever makes
sense!•Another way to view Stork:
•Glue between DAGMan and data transport or transport scheduler.
•So one DAG can describe a workflow, including both data movement and computation steps.
16http://www.cs.wisc.edu/condor
Stork jobs in a DAG• A DAG is defined by a text file, listing
each job and its dependents:# data-process.dagData IN in.storkJob CRUNCH crunch.condorData OUT out.storkParent IN Child CRUNCHParent CRUNCH Child OUT
• each node will run the Condor or Stork job specified by accompanying submit file
IN
CRUNCH
OUT
17http://www.cs.wisc.edu/condor
Important Stork
Parameters•STORK_MAX_NUM_JOBS limits number of active jobs
•STORK_MAX_RETRY limits job attempts, before job marked as failed
•STORK_MAXDELAY_INMINUTES specifies “hung job” threshold
18http://www.cs.wisc.edu/condor
Features in
DevelopmentMatchmakingo Job ClassAd with site ClassAdo Global max transfers per site limitso Load balancing across siteso Dynamic reconfiguration of siteso Coordination of multiple instances of Stork
Working prototype developed with Globus gridftp team
19http://www.cs.wisc.edu/condor
Further Ahead•Automatic startup of personal stork
server on demand
•Fair sharing between users
•Fit into new pluggable scheduling framework ala schedd-on-the-side
20http://www.cs.wisc.edu/condor
Summary
•Stork manages a job queue for data transfers
•A DAG may describe a workflow containing both data movement and processing steps.
21http://www.cs.wisc.edu/condor
Additional Resources
•http://www.cs.wisc.edu/condor/
stork/
•Condor Manual, Stork sections
•stork-announce@cs.wisc.edu list
•stork-discuss@cs.wisc.edu list
top related