pbs job management and taskfarming

PBS Job Management and Taskfarming

Joachim Wagner 2008-07-24

Outline

Why do we need a cluster?

ArchitectureMachines

Software

Job Management

Running jobsCommands

PBS Job discriptions

Taskfarming

Plans

Why do we need a cluster?

Resource conflictsWaiting for colleague’s job to finish

Trouble, e.g. disk full

Medium-size jobsToo big for desktop PC

Too small for ICHEC

Preparation of ICHEC runs

Learning

Cluster Architecture

SchoolNetwork

maia.computing.dcu.ie Separate

NetworkLoginsHomeSoftware

Job Queue

Nodes

Installed Software

OpenMPI

SRILM

MaTrEx, Moses, GIZA++

XLE, Sicstus

Johnson & Charniak’s reranking parser

In progress:LFG AA, incl. function labeller

PBS Job Management

Job Queue

Job SubmissionUser: JobDescription

Job Execution

Job SchedulerNodes are allocatedjob-exclusive for theduration of the job

PBS Job Management Commands

qsub myjob.pbssubmits a job

PBS description: shell script with #PBS commands(ignored by shell, see next slide)

qstat, qstat –f jobnumber

qdel jobnumber

pbsnodes –alist all nodes with status and properties

PBS Job Description

Number of nodesCPUs/node

Notification: end,begin and abort

Maximum runtime

Example: Memory-Intensive Job

Node Properties

min4GB, min8GB: at least this much

mem4GB, mem8GB: exactly this much

long: please use this property for long jobswill leave nodes that do not have this property available for other jobs

no limit enforced, but 24 h seems reasonable

Future:16 and 32 GB

CPU

local disk space (/tmp and swap)

CPU-Intensive Jobs

Parallelisable, for example Sentence by sentence processing

Cross-validation runs

Parameter search

Split into parts Run each part on a different CPU

Taskfarming

PBS JobDescription

TaskfarmingExecutable

(n instances)

1 Master n-1 Worker

Task file(.tfm):one taskper line

reading

MPICommunication

Taskexecution

childprocess

Taskfarming Executable

If Instance ID == 0Run master code loop:

Read .tfm file (arg 1)

Send lines to worker

Exit if no more task and all worker finished

ElseRun worker loop:

Ask master for a task

Execute task

Exit if master has no more tasks

Example: Taskfarming PBS File

Example: Taskfarming TFM File

Example: Taskfarming Helper Script

run-package.sh

Example: Taskfarming in Action

000CPU 1

001

002

Master: reads .tfm and distributes tasks

CPU 2

CPU 3

CPU 4

003

005

004

006

time

008

007

009

010

011

012

idle

Example: Non-Terminating Task

000CPU 1

001

002

Master: reads .tfm and distributes tasks

CPU 2

CPU 3

CPU 4

003

005

004

006 (does not terminate)

Killed at

Walltime

Limit

008

007

009

010

011

012

idle

idle

Estimating the PBS Walltime Parameter

Collect durations from test run

Usually high variance of execution timeLong sentences

Parameters

Don’t use #packages x avg. time per packageHigh risk (~50 %) that more time is needed

Instead: Random sampling with observed package durations: /home/jwagner/tools/walltime.py

Effect of Task Size

Job will wait for last task to finish (or be killed when walltime limit is reached)

What if a task crashes?Results are incomplete

Next tasks is executed

What if a task does not terminate?Results are incomplete

Fewer CPUs available for remaining tasks

Overhead of starting tasks

Considering Multiple CPUs per Node

4 CPU cores per node

8 GB node -> 2 GB per core

CPUs compete for RAMSwapping of one task effects the 3 other tasks

Relatively slow CPUs in nodesCompared to new desktop PCs

Optimise throughput of cluster / EURNot: throughput of node or CPU

Depends on application

Plans

Fix sporadic errors of taskfarm.pyre-implentation in C

XML-RPC-based taskfarminghttp-based

Run master on maia

Run workers also outside the cluster

Set parameters at runtime

Add more nodes (CNGL)

Install additional software

Questions

?Contact:Joachim WagnerCNGL System [email protected](01) 700 6915

pbs job management and taskfarming

Documents

colleagues job

task crashes

task effects

allocated jobexclusive

taskfarming tfm fileexample

taskfarming pbs fileexample

pyeffect of task sizejob

tfm file arg