administrating htcondor “condor - colca canyon-” by “raultimate” © 2006 licensed under the...
TRANSCRIPT
![Page 1: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/1.jpg)
Administrating HTCondor
“Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license.
http://www.flickr.com/photos/7428244@N06/427485954/ http://www.webcitation.org/5g6wqrJPx
Alan De SmetCenter for High
Throughput [email protected]
http://research.cs.wisc.edu/htcondor
![Page 2: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/2.jpg)
The next 70 minutes…
› HTCondor Daemons & Job Startup
› Configuration Files› Security, briefly› Policy Expressions
h Startd (Machine)h Negotiator
› Priorities› Useful Tools› Log Files› Debugging Jobs
2
![Page 3: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/3.jpg)
Daemons & Job Startup
“LUNAR Launch” by Steve Jurvertson (“jurvetson”) © 2006
Licensed under the Creative Commons Attribution 2.0 license.
http://www.flickr.com/photos/jurvetson/114406979/
http://www.webcitation.org/5XIfTl6tX
![Page 4: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/4.jpg)
Job Startup
4
Execute MachineSubmit Machine
submit
schedd
starter Jobshadow
startd
Central Manager
collectornegotiator
J
S
S
J
J S
J J SS
master
mastermaster
![Page 5: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/5.jpg)
Configuration Files
“amp wiring” by “fbz_” © 2005
Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/fbz/114422787/
![Page 6: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/6.jpg)
› CONDOR_CONFIG environment variable, /etc/condor/condor_config, ~condor/condor_config
› All settings can be in this one file› Might want to share between all machines
(NFS, automated copies, Wallaby, etc)
Configuration File
6
![Page 7: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/7.jpg)
› LOCAL_CONFIG_FILEh Comma separated, processed in order
LOCAL_CONFIG_FILE = \ /var/condor/config.local,\ /var/condor/policy.local,\ /shared/condor/config.$(HOSTNAME),\
/shared/condor/config.$(OPSYS)
› LOCAL_CONFIG_DIRLOCAL_CONFIG_DIR = \ /var/condor/config.d/,\ /var/condor/$(OPSYS).d/
Other Configuration Files
7
![Page 8: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/8.jpg)
# I’m a comment!CREATE_CORE_FILES=TRUEMAX_JOBS_RUNNING = 50# HTCondor ignores case:log=/var/log/condor# Long entries:collector_host=condor.cs.wisc.edu,\ secondary.cs.wisc.edu
Configuration File Syntax
8
![Page 9: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/9.jpg)
› You reference other macros (settings) with:h A = $(B)h SCHEDD = $(SBIN)/condor_schedd
› Can create additional macros for organizational purposes
Configuration File Macros
9
![Page 10: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/10.jpg)
› Can append to macros:A=abcA=$(A),def
› Don’t let macros recursively define each other!A=$(B)B=$(A)
Configuration File Macros
10
![Page 11: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/11.jpg)
› Later macros in a file overwrite earlier onesh B will evaluate to 2:
A=1B=$(A)A=2
Configuration File Macros
11
![Page 12: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/12.jpg)
› These are simple replacement macros› Put parentheses around expressions
TEN=5+5HUNDRED=$(TEN)*$(TEN)
• HUNDRED becomes 5+5*5+5 or 35!
TEN=(5+5)HUNDRED=($(TEN)*$(TEN))
• ((5+5)*(5+5)) = 100
Macros and Expressions Gotcha
12
![Page 13: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/13.jpg)
Security,briefly
“Padlock” by Peter Ford © 2005
Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/peterf/72583027/
http://www.webcitation.org/5XIiBcsUg
![Page 14: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/14.jpg)
HTCondor Security› Strong authentication
of users and daemons› Encryption over the
network› Integrity checking over
the network
“locks-masterlocks.jpg” by Brian De Smet, © 2005Used with permission.
http://www.fief.org/sysadmin/blosxom.cgi/2005/07/21#locks
14
![Page 15: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/15.jpg)
Minimal Security Settings
› You must set ALLOW_WRITE, or nothing works
› Simplest setting:ALLOW_WRITE=*
h Extremely insecure!› A bit better:ALLOW_WRITE= \ *.cs.wisc.edu
“Bank Security Guard” by “Brad & Sabrina” © 2006
Licensed under the Creative Commons Attribution 2.0 licensehttp://www.flickr.com/photos/madaboutshanghai/184665954/ http://www.webcitation.org/5XIhUAfuY
15
![Page 16: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/16.jpg)
› Chapter 3.6, “Security,” in the HTCondor Manual
More on Security
“Zach Miller” by Alan De Smet
![Page 17: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/17.jpg)
Policy
“Don't even think about it” by Kat “tyger_lyllie” © 2005
Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/tyger_lyllie/59207292/
http://www.webcitation.org/5XIh5mYGS
![Page 18: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/18.jpg)
› Who gets to run jobs, when?
Policy
18
![Page 19: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/19.jpg)
› Specified in condor_configh Ends up slot ClassAd
› Policy evaluates both a slot ClassAd and a job ClassAd togetherh Policy can reference items in either ClassAd
(See manual for list)
› Can reference condor_config macros: $(MACRONAME)
Policy Expressions
19
![Page 20: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/20.jpg)
› Machine – An individual computer, managed by one startd
› Slot – A place to run a job, managed by one starter.h A machine may have many slotsh Partionable slots create more slots on the fly
› The start advertises each sloth The ClassAd is a “Machine” ad for historical
reasons
Slots vs Machines
20
![Page 21: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/21.jpg)
› START› RANK› SUSPEND› CONTINUE› PREEMPT› KILL
Slot Policy Expressions
21
![Page 22: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/22.jpg)
› START is the primary policy› When FALSE the slot enters the Owner
state and will not run jobs› Acts as the Requirements expression for
the slot, the job must satisfy STARTh Can reference job ClassAd values including
Owner and ImageSize
START
22
![Page 23: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/23.jpg)
› Indicates which jobs a slot prefersh Jobs can also specify a rank
› Floating point numberh Larger numbers are higher rankedh Typically evaluate attributes in the Job ClassAdh Typically use + instead of &&
RANK
23
![Page 24: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/24.jpg)
› Often used to give priority to owner of a particular group of machines
› Claimed slots still advertise looking for higher ranked job to preempt the current jobh RANK causes preemption!
RANK
24
![Page 25: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/25.jpg)
› When SUSPEND becomes true, the job is suspended
› When CONTINUE becomes true a suspended job is released
SUSPEND and CONTINUE
25“DSC03753” by Eva Schiffer © 2008 Used with permission http://www.digitalchangeling.com/pictures/ourCats2008/january2008/DSC03753.html
![Page 26: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/26.jpg)
› When PREEMPT becomes true, the job will be politely shut downh Vanilla universe jobs get SIGTERM
• Or user requested signal
h Standard universe jobs checkpoint
› When KILL becomes true, the job is SIGKILLedh Checkpointing is aborted if started
PREEMPT and KILL
26
![Page 27: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/27.jpg)
Minimal / Default Settings› Always runs jobs
START = TrueRANK = 0SUSPEND = FalseCONTINUE = TruePREEMPT = FalseKILL = False
“Lonely at the top” by Guyon Moree (“gumuz”) © 2005 Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/gumuz/7340411/ http://www.webcitation.org/5XIh8s0kI
27
![Page 28: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/28.jpg)
Policy Configuration
› I am adding nodes to the Cluster… but the Chemistry Department has priority on these nodes
28
“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the Creative Commons Attribution 2.0 licensehttp://www.flickr.com/photos/vmos/2078227291/ http://www.webcitation.org/5XIff1deZ
![Page 29: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/29.jpg)
› Prefer Chemistry jobsSTART = TrueRANK = Department == "Chemistry"SUSPEND = FalseCONTINUE = TruePREEMPT = FalseKILL = False
New Settings for the Chemistry nodes
29
![Page 30: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/30.jpg)
› Prefix an entry with “+” to add to job ClassAdExecutable = charm-runUniverse = standard+Department = "Chemistry"queue
Submit file with Custom Attribute
30
![Page 31: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/31.jpg)
START = TrueRANK = Department =?= "Chemistry"SUSPEND = FalseCONTINUE = TruePREEMPT = FalseKILL = False
What if “Department” not specified?
31
![Page 32: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/32.jpg)
› Give the machine’s owners (adesmet and roy) highest priority, followed by the Chemistry department, followed by the Physics department, followed by everyone else.h Can use automatic Owner attribute in job
attribute to identify adesmet and roy
More Complex RANK
32
![Page 33: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/33.jpg)
IsOwner = (Owner == "adesmet" \ || Owner == "roy")IsChem =(Department =?= "Chemistry")IsPhys =(Department =?= "Physics")RANK = $(IsOwner)*20 + $(IsChem)*10 \ + $(IsPhys)
More Complex RANK
33
![Page 34: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/34.jpg)
Policy Configuration
› I have an unhealthy fixation with PBS so… kill jobs after 12 hours, except Physics jobs get 24 hours.
34
“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the Creative Commons Attribution 2.0 licensehttp://www.flickr.com/photos/vmos/2078227291/ http://www.webcitation.org/5XIff1deZ
![Page 35: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/35.jpg)
› CurrentTimeh Current time, in Unix epoch time (seconds
since midnight Jan 1, 1970)
› EnteredCurrentActivityh When did HTCondor enter the current activity,
in Unix epoch time
Useful Attributes
35
![Page 36: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/36.jpg)
ActivityTimer = \ (CurrentTime - EnteredCurrentActivity)HOUR = (60*60)HALFDAY = ($(HOUR)*12)FULLDAY = ($(HOUR)*24)PREEMPT = \ ($(IsPhys) && ($(ActivityTimer) > $FULLDAY)) \ || \ (!$(IsPhys) && ($(ActivityTimer) > $HALFDAY)) KILL = $(PREEMPT)
Configuration
36
![Page 37: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/37.jpg)
Policy Configuration
› The cluster is okay, but... HTCondor can only use the desktops when they would otherwise be idle
37
“I R BIZNESS CAT” by “VMOS” © 2007 Licensed under the Creative Commons Attribution 2.0 licensehttp://www.flickr.com/photos/vmos/2078227291/ http://www.webcitation.org/5XIff1deZ
![Page 38: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/38.jpg)
› One possible definition:h No keyboard or mouse activity for 5 minutesh Load average below 0.3
Defining Idle
38
![Page 39: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/39.jpg)
› START jobs when the machine becomes idle
› SUSPEND jobs as soon as activity is detected
› PREEMPT jobs if the activity continues for 5 minutes or more
› KILL jobs if they take more than 5 minutes to preempt
Desktops should
39
![Page 40: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/40.jpg)
› LoadAvgh Current load average
› CondorLoadAvgh Current load average generated by HTCondor
› KeyboardIdleh Seconds since last keyboard or mouse activity
Useful Attributes
40
![Page 41: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/41.jpg)
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)BgndLoad = 0.3CPU_Busy = ($(NonCondorLoadAvg) >= $(BgndLoad))CPU_Idle = (!$(CPU_Busy))KeyboardBusy = (KeyboardIdle < 10)KeyboardIsIdle = (KeyboardIdle > 300)MachineBusy = ($(CPU_Busy) || $(KeyboardBusy))
Macros in Configuration Files
41
![Page 42: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/42.jpg)
START = $(CPU_Idle) && $(KeyboardIsIdle)SUSPEND = $(MachineBusy)CONTINUE = $(CPU_Idle) && KeyboardIdle > 120PREEMPT = (Activity == "Suspended") && \ $(ActivityTimer) > 300KILL = $(ActivityTimer) > 300
Desktop Machine Policy
42
![Page 43: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/43.jpg)
Mission Accomplished.
“Autumn and Blue Eyes” by Paul Lewis (“PJLewis”) © 2005 Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/pjlewis/46134047/ http://www.webcitation.org/5XIhBzDR2
![Page 44: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/44.jpg)
44Slot States
![Page 45: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/45.jpg)
Slot Activities
Section 3.5: Policy
Configuration for the
condor_startd)
![Page 46: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/46.jpg)
› Can add attributes to a slot’s ClassAd, typically done in the local configuration fileINSTRUCTIONAL=TRUENETWORK_SPEED=1000STARTD_EXPRS=INSTRUCTIONAL, NETWORK_SPEED
Custom Slot Attributes
46
![Page 47: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/47.jpg)
› Jobs can now specify Rank and Requirements using new attributes:Requirements = INSTRUCTIONAL=!=TRUERank = NETWORK_SPEED
› Dynamic attributes are available; see STARTD_CRON_* in the manual
Custom Slot Attributes
47
![Page 48: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/48.jpg)
› For further information, see section 3.5 “Policy Configuration for the condor_startd” in the HTCondor manual
› htcondor-users mailing listhttp://research.cs.wisc.edu/htcondor/mail-lists/
Further MachinePolicy Information
48
![Page 49: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/49.jpg)
Priorities
“IMG_2476” by “Joanne and Matt” © 2006 Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/joanne_matt/97737986/ http://www.webcitation.org/5XIieCxq4
![Page 50: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/50.jpg)
› Set with condor_prio› Users can set priority of their own jobs› Integers, larger numbers are higher priority› Only impacts order between jobs for a
single user on a single schedd› A tool for users to sort their own jobs
Job Priority
50
![Page 51: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/51.jpg)
› Determines allocation of machines to waiting users
› View with condor_userprio› Inversely related to machines allocated
(lower is better priority)h A user with priority of 10 will be able to claim
twice as many machines as a user with priority 20
User Priority
51
![Page 52: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/52.jpg)
› Effective User Priority is determined by multiplying two componentsh Real Priorityh Priority Factor
User Priority
52
![Page 53: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/53.jpg)
› Based on actual usage› Defaults to 0.5› Approaches actual number of machines
used over timeh Configuration setting PRIORITY_HALFLIFE
Real Priority
53
![Page 54: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/54.jpg)
› Assigned by administratorh Set with condor_userprio
› Defaults to 1 (DEFAULT_PRIO_FACTOR)
Priority Factor
54
![Page 55: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/55.jpg)
Negotiator Policy Expressions
› PREEMPTION_REQUIREMENTS and PREEMPTION_RANK
› Evaluated when condor_negotiator considers replacing a lower priority job with a higher priority job
› Completely unrelated to the PREEMPT expression
55
![Page 56: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/56.jpg)
› If false will not preempt machine h Typically used to avoid pool thrashingh Typically use:
• RemoteUserPrio – Priority of user of currently running job (higher is worse)
• SubmittorPrio – Priority of user of higher priority idle job (higher is worse)
› PREEMPTION_REQUIREMENTS=FALSE
PREEMPTION_REQUIREMENTS
56
![Page 57: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/57.jpg)
› Only replace jobs running for at least one hour and 20% lower priority
StateTimer = \ (CurrentTime – EnteredCurrentState)HOUR = (60*60)PREEMPTION_REQUIREMENTS = \ $(StateTimer) > (1 * $(HOUR)) \ && RemoteUserPrio > SubmittorPrio * 1.2
PREEMPTION_REQUIREMENTS
57
![Page 58: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/58.jpg)
› Picks which already claimed machine to reclaim
› Strongly prefer preempting jobs with a large (bad) priority and a small image size
PREEMPTION_RANK = \ (RemoteUserPrio * 1000000)\ - ImageSize
PREEMPTION_RANK
58
![Page 59: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/59.jpg)
› Manage priorities across groups of users and jobs
› Can guarantee minimum numbers of computers for groups (quotas)
› Supports hierarchies› Anyone can join any group
Accounting Groups
59
![Page 60: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/60.jpg)
Tools
“Tools” by “batega” © 2007 Licensed under Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/batega/1596898776/ http://www.webcitation.org/5XIj1E1Y1
![Page 61: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/61.jpg)
› Find current configuration values
% condor_config_val MASTER_LOG/var/condor/logs/MasterLog% cd `condor_config_val LOG`
condor_config_val
61
![Page 62: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/62.jpg)
› Can identify source% condor_config_val –v CONDOR_HOSTCONDOR_HOST: condor.cs.wisc.edu Defined in ‘/etc/condor_config.hosts’, line 6
condor_config_val -v
62
![Page 63: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/63.jpg)
› What configuration files are being used?% condor_config_val –configConfig source: /var/home/condor/condor_configLocal config sources: /unsup/condor/etc/condor_config.hosts /unsup/condor/etc/condor_config.global /unsup/condor/etc/condor_config.policy /unsup/condor-test/etc/hosts/puffin.local
condor_config_val -config
63
![Page 64: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/64.jpg)
› Retrieve logs remotely
condor_fetchlog beak.cs.wisc.edu Master
condor_fetchlog
64
![Page 65: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/65.jpg)
› condor_status› condor_q› Greg's “How High Throughput was My
Cluster?” this afternoon
Checking the current status
65
![Page 66: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/66.jpg)
› Queries the collector for information about daemons in your pool
› Defaults to finding condor_startds› condor_status –schedd summarizes
all job queues› condor_status –master returns list of
all condor_masters
Querying daemons condor_status
66
![Page 67: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/67.jpg)
› -long displays the full ClassAd› Optionally specify a machine name to limit
results to a single host
condor_status –l node4.cs.wisc.edu
› Do not use in scripts/programs
condor_status
67
![Page 68: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/68.jpg)
› Only return ClassAds that match an expression you specify
› Show me idle slots with 1GB or more memoryhcondor_status -constraint 'Memory >= 1024 && Activity == "Idle"'
condor_status -constraint
68
![Page 69: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/69.jpg)
› Report only fields you request› Census of systems in your pool:> condor_status -af Activity OpSys Arch | sort | uniq -c
56 Busy LINUX X86_64 35 Idle LINUX INTEL 1515 Idle LINUX X86_64 369 Idle WINDOWS X86_64 31 Retiring LINUX X86_64
condor_status -autoformat
69
![Page 70: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/70.jpg)
› Separate by tabs, commas, spaces, newlines
› Label each field by name› Escape as a ClassAd value› Add headers› Several easy to parse options
condor_status -autoformat
70
![Page 71: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/71.jpg)
condor_status -format
› Like autoformat, but with manual formatting
› Useful for writing simple reports
› Uses C printf style formatsh One field per argument
71
“slanting” by Stefano Mortellaro (“fazen”) © 2005Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/fazen/17200735/ http://www.webcitation.org/5XIhNWC7Y
![Page 72: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/72.jpg)
% condor_status -format '%-10s ' Activity -format '%-7s ' OpSys -format '%s\n' Arch | sort | uniq -c
54 Busy LINUX X86_64 35 Idle LINUX INTEL 1513 Idle LINUX X86_64 369 Idle WINDOWS X86_64 31 Retiring LINUX X86_64
condor_status -format
72
![Page 73: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/73.jpg)
› View the job queue› The -long option is useful to see the
entire ClassAd for a given job› supports –constraint, -autoformat,
and -format› Can view job queues on remote machines
with the -name option
Examining Queues condor_q
73
![Page 74: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/74.jpg)
› Why isn't this job running? default› On this machine? -machine› What does this machine hate my job? -better-analyse:reverse› General reports -analyze:sum -analyze:sum,rev
condor_q -analyze and
-better-analyze
74
![Page 75: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/75.jpg)
Log Files
“Ready for the Winter” by Anna “bcmom” © 2005 Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/bcmom/59207805/ http://www.webcitation.org/5XIhRO8L8
![Page 76: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/76.jpg)
› HTCondor maintains one log file per daemon
› Can increase verbosity of logs on a per daemon basish SHADOW_DEBUG, SCHEDD_DEBUG, and
othersh Space separated list
HTCondor’s Log Files
76
![Page 77: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/77.jpg)
› D_FULLDEBUG dramatically increases information loggedh Does not include other debug levels!
› D_COMMAND adds information about about commands receivedSHADOW_DEBUG = D_FULLDEBUG D_COMMAND
Useful Debug Levels
77
![Page 78: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/78.jpg)
› Log files are automatically rolled over when a size limit is reachedh Only one old version is kepth Defaults to 10 megabytesh Rolls over quickly with D_FULLDEBUGh MAX_DEFAULT_LOGh Also per daemon settings
• MAX_SHADOW_LOG, MAX_SCHEDD_LOG, and others
Log Rotation
78
![Page 79: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/79.jpg)
› Many log files entries primarily useful to HTCondor developersh Especially if D_FULLDEBUG is onh Minor errors are often logged but correctedh Take them with a grain of salth [email protected]
HTCondor’s Log Files
79
![Page 80: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/80.jpg)
Debugging Jobs
“Wanna buy a Beetle?” by “Kevin” © 2006 Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/kevincollins/89538633/ http://www.webcitation.org/5XIiMyhpp
![Page 81: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/81.jpg)
› Examine the job with condor_qh especially the very powerful –analyze and -better-analyze
Debugging Jobs:condor_q
81
![Page 82: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/82.jpg)
› Examine the job’s user logh Can find with:
condor_q -af UserLog 17.0h Set with “log” in the submit fileh You can set EVENT_LOG to get a unified log for
all jobs under a schedd
› Contains the life history of the job› Often contains details on problems
Debugging Jobs:User Log
82
![Page 83: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/83.jpg)
› Examine ShadowLog on the submit machineh Note any machines the job tried to execute onh There is often an “ERROR” entry that can give
a good indication of what failed
Debugging Jobs:ShadowLog
83
![Page 84: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/84.jpg)
› No ShadowLog entries? Possible problem matching the job.h Examine ScheddLog on the submit machineh Examine NegotiatorLog on the central
manager
Debugging Jobs:Matching Problems
84
![Page 85: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/85.jpg)
› ShadowLog entries suggest an error but aren’t specific?h Examine StartLog and StarterLog on the
execute machine
Debugging Jobs:Remote Problems
85
![Page 86: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/86.jpg)
› HTCondor logs will note the job ID each entry is forh Useful if multiple jobs are being processed
simultaneouslyh grepping for the job ID will make it easy to find
relevant entries
› Occasionally HTCondor doesn't know yet…
Debugging Jobs:Reading Log Files
86
![Page 87: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/87.jpg)
› If necessary add “D_FULLDEBUG D_COMMAND” to DEBUG_DAEMONNAME setting for additional log information
› Increase MAX_DAEMONNAME_LOG if logs are rolling over too quickly
› If all else fails, email ush [email protected]
Debugging Jobs: What Next?
87
![Page 88: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/88.jpg)
More Information
“IMG 0915” by Eva Schiffer © 2008 Used with permission http://www.digitalchangeling.com/pictures/ourCats2008/january2008/IMG_0915.html
![Page 89: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/89.jpg)
› Staff here at HTCondor Week
› HTCondor Manual› htcondor-users mailing
listhttp://research.cs.wisc.edu/
htcondor/mail-lists/
More Information
89“Condor Manual” by Alan De Smet
(Actual first page of the 7.0.1 manual on about 700 pages of other output. The actual 7.0.1 manual is about 860 pages.)
![Page 90: Administrating HTCondor “Condor - Colca Canyon-” by “Raultimate” © 2006 Licensed under the Creative Commons Attribution 2.0 license. N06/427485954](https://reader036.vdocuments.us/reader036/viewer/2022081519/56649d8a5503460f94a704fc/html5/thumbnails/90.jpg)
Thank You!
“My mouse” by “MysterFaery” © 2006 Licensed under the Creative Commons Attribution 2.0 license
http://www.flickr.com/photos/mysteryfaery/294253525/ http://www.webcitation.org/5XIi6HRCM
Any questions?