clusters, subclusters and queues a spotters guide chris brew hepsysman 06/11/2008
TRANSCRIPT
Clusters, SubClusters and Queues
A Spotters GuideChris Brew
HepSysMan 06/11/2008
Slide 2
Current Default Setup
• YAIM Sets up by Default– One Cluster (Batch System)– One SubCluster (Set of WNs)– Multiple CEs (queues) pointing to the
subcluster
• Falls down with– Non Identical Worker Nodes– Multiple CENodes attached to the same batch
system
Slide 3
CE Node
The way it’s supposed to be
Type 1 WNType 1 WNType 1 WNType 1 WN
SubClusType 2 WNType 2 WNType 2 WNType 2 WN SubClus
Type 3 WNType 3 WNType 3 WNType 3 WN
SubClus
Cluster
Q1
Q2
Q3
Q4
Q5
Tags
Slide 4
The way it usually is
Type 1 WNType 1 WNType 1 WNType 1 WN
Type 21 WNType 21 WNType 21 WNType 21 WN
Type 31 WNType 31 WNType 31 WNType 31 WN
CE Node
SubClus
ClusterQ1
Q2
Q3
Q4
Q5
Tags
Slide 5
CE Node
How bad it can be
Type 1 WNType 1 WNType 1 WNType 1 WN
Type 21 WNType 21 WNType 21 WNType 21 WN
Type 31 WNType 31 WNType 31 WNType 31 WN
CE Node
SubClus
Cluster Q1
Q2
Q3Q4
SubClus
Cluster Q1
Q2
Q3Q4
Tags
Tags
Slide 6
Problem on Non Identical Worker
Nodes• Default setup assumes that all worker nodes
are identical– Obviously no the case at most sites– Subcluster has to publish the lowest spec WN
• Leads to:– Small memory jobs wasting large memory nodes– Inability to publish existence of large memory nodes– Differing CPU specs lead to inaccurate timing and
accounting (CPU scaling helps here)
Slide 7
Problem of multiple CENodes
• Sites want to add multiple CENodes for Scaling and Redundancy– Should just add CEs (queue endpoints)– Currently duplicates Clusters and SubClusters
• Causes problems in CPU counting (gStat, GridMap, Accounting Reports, etc.)
• Various hacks to try to help with this
Slide 8
Current Hacks
• Can already set up multiple Clusters, SubClusters to advertise different memory queues– See publishing for RAL-LCG2 and UKI-
SOUTHGRID-RALPP– Involves hand crafted ldif files to set up
(Sub)Clusters and map queues to them– Cannot let YAIM near them
Slide 9
Traylen Proposal
• Move (Sub)Cluster publishing from CENode to new node type– Probably share node with site-bdii
• CENode gip will associate queues to SubClusters
• Software Tags currently associated with CENode not (Sub)Cluster, they’ll be fixed and published through the new node type.
Slide 10
How it may be
Type 1 WNType 1 WNType 1 WNType 1 WN
Type 2 WNType 2 WNType 2 WNType 2 WN
Type 3 WNType 3 WNType 3 WNType 3 WN
Glite-Cluster Node
SubClus
Cluster
CE Node
SubClus
Q1Q2Q3
Q4 Q5
Tags Tags
SubClus
Tags
CE Node
Q1Q2Q3
Q4 Q5
CE Node
Q1Q2Q3
Q4 Q5
Slide 11
Our Experience
• We’ve put in hand crafted ldif files to define 500MB, 1000MB and 2000MB SubClusters
• grid[500|1000|2000] queues pointing at them on both CENodes
• Technically it works – jobs with higher memory requirements only match the high memory queues
• In practice it makes no difference – almost no jobs include memory requirements
Slide 12
Conclusion
• You’re probably not doing it right at the moment– But the fix is probably worse
• You can add hacks to provide more info to the batch system– But it probably won’t make any difference
• Things are likely to change (for the better) in the near future– Wait until then