www.consorzio-cometa.it fesr consorzio cometa - progetto pi2s2 wms - scripting techniques fabio...
TRANSCRIPT
www.consorzio-cometa.it
FESR
Consorzio COMETA - Progetto PI2S2
WMS - Scripting techniques
Fabio ScibiliaINFN – Catania, Italy
Tutorial per utenti e sviluppo di applicazioni Catania, 16-20 Luglio 2007
Catania, Tutorial al DIIT, Luglio 2007 2
Preliminars
• LCG middleware– The workload is managed by the Resource Broker– Doesn’t support neither parametric jobs nor DAGs– Works fine
• gLite– Support both the parametric and the DAG jobs– Under developing– Uses WMProxy to manage the workload– Will be available in a few months
• Tips and tricks– Some ideas to use LCG middleware to support parametric jobs
and DAGs while waiting for WMProxy stable release
Catania, Tutorial al DIIT, Luglio 2007 3
Exercise 1: Parametric jobs
Catania, Tutorial al DIIT, Luglio 2007 4
Exercise 1: The bash script (1/2)
• A set of jobs differing for input files only– The bash script looks like this
#!/bin/shif [ "$2" = "" ]; then echo "Usage: $0 begin end [step]" echo " begin The first value of the sequence" echo " end The last value of the sequence" echo " step The step between two submissions" exit 0fijoblist="jobs.list"begin_index=$1 // the first parameter of the scriptend_index=$2 // the second parameter of the script
if [ "$3" = "" ]; then step=1;else step=$3 // the third parameter of the scriptfi . . .
Catania, Tutorial al DIIT, Luglio 2007 5
Exercise 1: The bash script (2/2)
# starts iterationsfor ((index=$begin_index; index<=$end_index; index=$index+$step)) do # we generate the input file automatically. Obviously it can be made by hand inputfile="input$index.txt" echo "creating input file $inputfile" echo "The name of this input file is $inputfile" > $inputfile
# create the correspondent jdl file depending on the index jdlfile="job$index.jdl“ # name of the jdl echo "creating JDL file $jdlfile" ( echo 'Type="Job";' echo 'JobType="Normal";' echo 'Executable=“/bin/cat";' echo "Arguments=\"$inputfile\";" echo "StdOutput=\"stdout$index.txt\";" echo "StdError=\"stderr$index.txt\";“ echo "InputSandbox={\"$inputfile\"};" echo "OutputSandbox={\"stdout$index.txt\", \"stderr$index.txt\"};" ) > $jdlfile edg-job-submit -o jobs.id $jdlfile # actual job submissiondone # end of iterations
Catania, Tutorial al DIIT, Luglio 2007 6
Exercise 2: DAGs
Catania, Tutorial al DIIT, Luglio 2007 7
1
Exercise 2: DAG modelling
• DAGs can be emulated with a simplified Petri net– A job is submitted only when activating jobs have terminated– Each transition bar corresponds to a bash script that
Waits for termination of activating job(s) by polling every minute Collects the output Submits next job(s)
2
3
4
5
6
Catania, Tutorial al DIIT, Luglio 2007 8
./last_job.sh: submits the last job and waits for its completion, downloading the output
./polling.sh: waits for jobs [1..n] completion, collect the output and creates the final input file
./submitter.sh: generates input[1..n].txt and submits jobs
Exercise 2: An example
• We emulate a simple split and merge DAG– One transition bar– Anyway, this example can be extended to any possible DAG
1
2
n
input1.txt
input2.txt
input(n).txt
last
stdout
stdout
stdout
final_input final_output
./polling &&
./last_job.sh: Implement the bar transion
Catania, Tutorial al DIIT, Luglio 2007 9
Exercise 2: ./submitter.sh
#!/bin/sh
if [ "$1" = "" ]; then echo "Usage: $0 num-splits“ ; exit 0fifor ((index=1; index<= $1; index++)); do # for each job echo "this is the content of input$index.txt" >> input$index.txt
( ## creates the jdl for this job echo "Type=\"Job\";" echo "JobType=\"Normal\";" echo "Executable=\"/bin/cat\";" echo "Arguments=\"input$index.txt\";" echo "InputSandbox={\"input$index.txt\"};" echo "StdOutput=\"stdout.txt\";" echo "StdError=\"stderr.txt\";" echo "OutputSandbox={\"stdout.txt\", \"stderr.txt\"};" ) > job$index.jdl;
edg-job-submit -o jobs.id job$index.jdldone
Catania, Tutorial al DIIT, Luglio 2007 10
Exercise 2: ./submitter.sh output
[fscibi@glite-tutor dag]$ ./submitter.sh 2
. . . .
The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier is:
- https://glite-rb2.ct.infn.it:9000/Od68j9IBOuJHGlUq-EfWTg
The job identifier has been saved in the following file: /home/fscibi/tips_and_tricks/dag/jobs.id
. . . .
The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier is:
- https://glite-rb2.ct.infn.it:9000/-suh1wmmo1VvYJd_4AiLhA
The job identifier has been saved in the following file: /home/fscibi/tips_and_tricks/dag/jobs.id
Catania, Tutorial al DIIT, Luglio 2007 11
Exercise 2: ./polling.sh (1/4)
#!/bin/sh
while read line; do if [ "$line" != "###Submitted Job Ids###" ]; then joblist="$joblist $line" fidone < jobs.id
for job in $joblist; do status="unknown" finished="false"
while [ "$finished" = "false" ]; do # loops waiting for job completion
## Gets the status of the job echo echo "getting status of job $job" output=`edg-job-status $job`
status=`echo "$output" | grep "Current Status" | awk '{print $3 }'` echo "status = $status"
Catania, Tutorial al DIIT, Luglio 2007 12
Exercise 2: ./polling.sh (2/4)
## depens on the status, decides what to do case $status in "Aborted“ ) echo "The job has been aborted on the CE" finished="true" ;;
"Cleared“ ) echo "The job output sandbox has been already retrieved. I don't know where!" finished="true" ;;
"Done“ ) echo "Job $job Done!!! Downloading the output"
## executes and parses the output of edg-job-get-output ## to understand where the output has been stored
Catania, Tutorial al DIIT, Luglio 2007 13
Exercise 2: ./polling.sh (3/4)
edg-job-get-output --dir . $job | (pipes the edg-job-get-output to llok for job status found="false" while read line; do if "$found" = "true“ ; then ## this line contains the dir path dirpath=$line echo "output sandbox stored at $dirpath" break fi if echo "$line" | grep -q "have been successfully retrieved and stored“ ; then found= "true" ## next line contains the dir path fi done
if "$found" = "true“; then filename=$dirpath/stdout.txt echo "appending $filename to final_input" cat $filename >> final_input fi ) finished="true" ;;
Catania, Tutorial al DIIT, Luglio 2007 14
Exercise 2: ./polling.sh (4/4)
*) echo "sleeping 1 minute" sleep 1m ;; esac
done # whiledone # for
[fscibi@glite-tutor dag]$ ./polling.sh . . . (after a while)getting status of job https://glite-rb2.ct.infn . . . Od68j9IBOuJHGlUq-EfWTgstatus = DoneJob https://glite-rb2.ct.infn . . . Done!!! Downloading the outputoutput sandbox stored at . . . /dag/fscibi_Od68j9IBOuJHGlUq-EfWTgappending . . .dag/fscibi_Od68j9IBOuJHGlUq-EfWTg/stdout.txt to final_input
getting status of job https://glite-rb2.ct.infn . . . _-suh1wmmo1VvYJd_4AiLhAstatus = DoneJob https://glite-rb2.ct.infn . . . Done!!! Downloading the outputoutput sandbox stored at . . . /dag/fscibi_-suh1wmmo1VvYJd_4AiLhAappending . . . dag/fscibi_-suh1wmmo1VvYJd_4AiLhA/stdout.txt to final_input
Catania, Tutorial al DIIT, Luglio 2007 15
Exercise 2: Submitting last job
[fscibi@. . . ] cat last_job.sh
#!/bin/sh
## submits the last jobedg-job-submit -o last_job.id last_job.jdl
status=unknownwhile [ "$status" != "Done" ]; do
echo "sleeping 30 seconds“ sleep 30s
output=`edg-job-status -i last_job.id` status=`echo "$output" | grep "Current Status" | awk '{print $3 }'` echo "status = $status"done
edg-job-get-output -i last_job.id --dir .
echo "Everything is Done !!! "
[fscibi@. . . ] cat last_job.jdl
Type="Job";JobType="Normal";Executable="/bin/cat";Arguments="-n final_input";StdOutput="final_output";StdError="stderr.txt";InputSandBox={"final_input"};OutputSandbox={"stderr.txt", "final_output"};
Catania, Tutorial al DIIT, Luglio 2007 16
Exercise 2: ./last_job.sh output
[fscibi@glite-tutor dag]$ ./last_job.sh. . . The job has been successfully submitted to the Network Server.. . . - https://glite-rb2.ct.infn.it:9000/q_rqVQpNFt0GshDn5MZHEw. . . sleeping 30 seconds (many times)status = Scheduled (waiting for status Done). . .sleeping 30 secondsstatus = Done
Retrieving files from host: . . . Output sandbox files for the job: - https://glite-rb2.ct.infn.it:9000/q_rqVQpNFt0GshDn5MZHEw have been successfully retrieved and stored in the directory: /home/fscibi/tips_and_tricks/dag/fscibi_q_rqVQpNFt0GshDn5MZHEw
"Everything is Done !!! [fscibi@glite-tutor dag]$ cat fscibi_q_rqVQpNFt0GshDn5MZHEw/final_output 1 this is the content of input1.txt 2 this is the content of input2.txt
Catania, Tutorial al DIIT, Luglio 2007 17
References
• JDL (WMS Netrwork Server)
https://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL-Attributes-v0-7.doc
• JDL (WMS WMProxy)
https://edms.cern.ch/file/590869/1/EGEE-JRA1-TEC-590869-JDL-Attributes-v0-5.doc
• Advanced BASH scriptinghttp://tldp.org/LDP/abs/html/http://tldp.org/LDP/abs/html/
• Gilda twiki pageshttps://grid.ct.infn.it/twiki/bin/view/GILDA/WebHome
Catania, Tutorial al DIIT, Luglio 2007 18
Questions . . .