glideinwms frontend internals - glideinwms training jan 2012

30
UCSD Jan 17th 2012 Frontend Internals 1 glideinWMS Training @ UCSD glideinWMS frontend Internals by Igor Sfiligoi (UCSD)

Upload: igor-sfiligoi

Post on 15-Jan-2015

455 views

Category:

Technology


0 download

DESCRIPTION

This presentation provides a detailed insight on the internal working of the glideinWMS Frontend. Part of the glideinWMS Training session held in Jan 2012 at UCSD.

TRANSCRIPT

Page 1: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 1

glideinWMS Training @ UCSD

glideinWMS frontendInternals

by Igor Sfiligoi (UCSD)

Page 2: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 2

Refresher - Glideins

● A glidein is just a properly configured Condor execution node submitted as a Grid job● glideinWMS

provides automation Collector

Negotiator

Central manager

Submit node

Schedd

Execution node

Startd

Job

Submit node

Submit node

glideinWMS

GlobusGlobus

CREAMExecution nodeglidein

Execution nodeglidein

Execution nodeglidein

glidein

Page 3: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 3

Refresher – Glidein Frontend

● The frontend monitors the user Condor pool,does the matchmaking and requests glideins● Factory a slave

Factory node

Condor

Factory

Frontend node

Frontend

CREAM

Globus

Submit node

Submit node

Central manager

Execution nodeglidein

Execution nodeglidein

Worker node

glideinMonitorCondor

Requestglideins

Submitglideins

MatchStartd

Job

Configure Condor G.N.

Page 4: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 4

Refresher - Cardinality

● N-to-M relationship● Each Frontend can talk to many Factories● Each Factory may serve many Frontends

Startd

Glidein Factory

ScheddUser job

Collector

Negotiator

VO Frontend

StartdUser job

ScheddCollector

Negotiator

VO Frontend

StartdUser jobGlidein Factory

Page 5: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 5

Frontend architecture

● The frontend is composed of:● The Condor daemons● The glideinWMS frontend proper● Condor client – to talk to the factories● Web server – deliver code and data to glideins

+ monitoring

● The glideinWMS frontend itself composed of:● Group processes – do the real work● Master frontend – controls the others and

aggregates monitoring

Page 6: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 6

Frontend arch - Picture

Frontend node

Factory

Frontend

EntryGroup Group

Spawn

...

Factory

glidein

WebServer

Submit node

Submit node

Central manager

Frontend Domain

Page 7: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 7

Condor processes

● Explained in enough detail in previous talk● Will not repeat myself

Collector

Negotiator

Central manager

Submit node

Schedd

Submit node

Submit node

Page 8: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 8

Frontend processes

● Real work performed by Group process● glideinFrontendElement.py● One process x Group

● They are controlled by master Frontend● glideinFrontend.py● Starts the other processes● Aggregates monitoring

Frontend ==Frontend Group

in the rest of the talk

Page 9: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 9

Frontend role

● The VO frontend is the brain of a glideinWMS-based pool● Like a site-level “negotiator”

Factory node

Frontend

Submit node

Submit node

Central manager

MonitorCondor

Requestglideins

Match

VO domain Findidle jobs

Findentries

Match

Requestglideins

Factory node

Page 10: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 10

Reminder - Two level matchmaking

● The frontend triggers glidein submission● The “regular” negotiator matches jobs to glideins

Collector

Negotiator

Central manager

Submit node

Schedd

Execution node

Startd

Job

Factory

GlobusGlobus

CREAMExecution nodeglidein

Execution nodeglidein

Execution nodeglidein

glidein

Frontend

Page 11: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 11

Matchmaking logic

● The Frontend matchmaking policy is implemented centrally● By the VO admin – not by the users

● It can use the attributes from both the job and Factory ClassAds

● Should be kept in sync with Negotiator policy● Which is not centralized● One way to define in the glidein START expression● Unfortunately, one python expression other ClassAds

Page 12: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 12

Example matchmaking logic

● Frontend

job.has_key("DESIRED_Sites") &&glidein["attrs"].get("GLIDEIN_Site") in job["DESIRED_Sites"].split(",")

● Negotiator (via glidein START)

GLIDECLIENT_Start = stringListMember(GLIDEIN_Site, DESIRED_Sites,",")=?=True

More details at http://tinyurl.com/glideinWMS/doc.prd/factory/custom_vars.html

Page 13: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 13

Communication Protocol

● No listen sockets● All communication one way (Frontend->Factory)

● Each Factory provides a Collector● Communication based on ClassAds● All security implemented in the Collector

● Use standard cmdline tools for communication● condor_status and condor_advertise

Page 14: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 14

Protocol sequence

● Polling loop● Read Factory ClassAds from all factory Collectors● Match against jobs● Advertise own existence and requests

● Frontend sends 4 types of info● Own identity● Glidein submission regulation instructions● Glidein parameters● Pilot Proxy

Page 15: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 15

Glidein submission regulation

● The glideinWMS glidein request logicis based on the principle on “constant pressure”● Frontend Group requests a certain number of

“idle glideins” in the factory queue at all times● It does not request a specific number of glideins

● This is done due to the asynchronous nature of the system● Both the factory entries and the frontend groups are

in a polling loop and talk to each other indirectly

Page 16: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 16

Glidein requests

● Frontend matches job attrs against entry attrs● It then counts the matched idle jobs● A fraction of this number becomes the

“pressure requests” (up to 1/3)● This number is then capped (~20)● The attribute in the ClassAd is

ReqIdleGlideins

● The Frontend also advertisesReqMaxRunningGlideins● Emergency break

Page 17: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 17

Scaling back

● The Frontend can also request that existing glideins in the Factory queues are removedReqRemoveExcess● NO – Default, never remove● WAIT – Remove any glidein not yet at a site● IDLE – Remove any glidein that has not started yet● ALL – Remove all glideins

● Frontend pretty conservative● Only requests removal if no user jobs in the queues

Page 18: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 18

Parameters

● Frontend can send attributes to glideins:● Dynamically – as parameter in the ClassAd● Statically – as entry in a config file

● Attributes typically static● Current Frontend implementation does not really

have much support for dynamicity

Page 19: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 19

Pilot proxy delegation

● Pilot proxy is encrypted with factory pub key● Then published in the ClassAd● Only owner of priv. key can decrypt it

● However● Must make sure we are talking to a trusted Factory!

– not just anyone providing a pub key● More details in a few slides

Factory node

Collector

Entry

Frontend node

Frontend

Get key

Deliver proxy(encrypted)

Globusglidein

glidein

Useproxy

Page 20: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 20

Pilot proxy selection

● A Frontend must have at least one pilot proxy● But can have more than one

● Many proxies can be used for priority reasons● When competing with non-pilot submission● Want to have as many proxies as users served

● Proxy selection plugin based

Page 21: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 21

Pilot proxy plugins

● Several standard plugins● ProxyFirst – Only the first listed● ProxyAll – All listed● ProxyUserCardinality – First N, with N=#users● ProxyUserMapWRecycling – N, with pilot-to-user mapping

● VO admin could implemented his own, if desired

Most used

Page 22: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 22

Factory ClassAd

Page 23: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 23

Frontend ClassAd

Page 24: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 24

Frontend node

Frontend

Security - Authorization

● Mutual authorization● The frontend admin decides

which Factories to talk to● The factory admin decides

which Frontends to serve● Based on x509 Dns

● Both sides have whitelists Factory node

Collector

Factory

Frontend node

Frontend

Factory node

Collector

Factory

Authentication basedon GSI/x509

Frontend needs a service proxy

Page 25: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 25

Trusting the factory key

● It is all just ClassAds!● Anyone can publish a ClassAd and declare to be a factory

● However, Factory Collector knows who published it● And advertises it as the attribute AuthenticatedIdentity

● Cannot be faked by the client

● Frontend has a whitelistof trusted factories

Collector

Factory

Frontend

Frontend

a1b1c1ID1

a2b2c2ID2

a3b3c3ID3

Page 26: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 26

Security handles

● As we said, mutual authentication with Factory● Frontend provides (and Factory whitelists)

● Service Proxy to talk to Factory Collector● Frontend Security name● Proxy Security Class

● Frontend whitelists (obtained from Factory admins)

● Factory Collector DN● Own mapping @Factory● Factory mapping @Factory

One set per factory collector

One per pilot proxy

One setfor wholeFrontend(all Groups)

Page 27: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 27

Security within the VO domain

● Frontend process, Collector and schedds often not on the same node● Need network security

● All processes must whitelist each other● Again, GSI based

Frontend

Schedd

Schedd

Collector/Negotiator

MonitorCondor

Could be even over WANCMS setup has nodesin CA, IL and Europe

Page 28: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 28

THE END

Page 29: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 29

Pointers

● The official project Web page ishttp://tinyurl.com/glideinWMS

● glideinWMS development team is reachable [email protected]

● OSG glidein factory at UCSDhttp://hepuser.ucsd.edu/twiki2/bin/view/UCSDTier2/OSGgfactoryhttp://glidein-1.t2.ucsd.edu:8319/glidefactory/monitor/glidein_Production_v4_1/factoryStatus.html

Page 30: glideinWMS Frontend Internals - glideinWMS Training Jan 2012

UCSD Jan 17th 2012 Frontend Internals 30

Acknowledgments

● The glideinWMS is a CMS-led project developed mostly at FNAL, with contributions from UCSD and ISI

● The glideinWMS factory operations at UCSD is sponsored by OSG

● The funding comes from NSF, DOE and the UC system