egee-ii infso-ri-031688 enabling grids for e-science slides based on material from sergio...

31
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org www.glite.org Slides based on material from Sergio Andreozzi INFN-CNAF and from Pedro Rausch Bello 3 rd EELA workshop OMII-Europe All-Hands Meeting Bologna, 12-13 February 2007 The middleware

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.orgwww.glite.org

Slides based on material from Sergio Andreozzi INFN-CNAF

and fromPedro Rausch Bello3rd EELA workshop

OMII-Europe All-Hands Meeting

Bologna, 12-13 February 2007

The middleware

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Disclaimer

• This presentation is based on materials provided and authorized by the EGEE project and is freely available to download and use according to the terms of the following license:

http://creativecommons.org/licenses/by-nc-sa/2.5/

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

OUTLINE

• The gLite middleware– Development process– Middleware decomposition

Foundation High-level services

• The EGEE Project– Objective– Relationship to other projects

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Part IThe gLite middleware

Programming the Grid with gLitehttp://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLite process• Process controlled by the

Technical Coordination Group (TCG)

• Task Forces with developers, applications, testers and deployment experts

• After gLite 3.0 adopt a continuous release process– No more big-bang releases

with fixed deadlines for all– Develop components as

requested by users and sites– Deploy or upgrade as soon as

testing is satisfactory• Major releases synchronized

with large scale activities of VOs (SCs)– Next major release foreseen for

the autumn

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Middleware structure

• Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware

• Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory

• Foundation Grid Middleware will be deployed on the EGEE infrastructure– Must be complete and robust

– Should allow interoperation with other major grid infrastructures

– Should not assume the use of Higher-Level Grid Services

Foundation Grid Middleware

Security model and infrastructure

Computing (CE) and Storage Elements (SE)

Accounting

Information and Monitoring

Higher-Level Grid Services

Workload Management

Replica Management

Visualization

Workflow

Grid Economies

...

Applications

Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

gLite Services Decomposition

6 High Level Services

+ CLI & API

Legend:

Available

Foreseen in the architecture (only Job provenance will be available by the end of EGEE-II)

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Job Workflow in gLite

UIJDL

Logging &Logging &Book-keepingBook-keeping

ResourceResourceBrokerBroker

Job SubmissionJob SubmissionServiceService

StorageStorageElementElement

ComputingComputingElementElement

Information Information ServiceService

Job Status

LFCLFCCatalogCatalog

DataSets info

Author.&Authen.

Job S

ub

mit

Even

t

Job

Qu

ery

Job

Stat

us

Input “sandbox”

Input “sandbox” + Broker InfoGlobus RSL

Output “sandbox”

Output “sandbox”

Job Status

Pu

blis

h

vom

s-pr

oxy-

init

Exp

and

ed J

DL

SE & CE info

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Job Workflow in gLite

UIJDL

Logging &Logging &Book-keepingBook-keeping

ResourceResourceBrokerBroker

Job SubmissionJob SubmissionServiceService

StorageStorageElementElement

ComputingComputingElementElement

Information Information ServiceService

Job Status

LFCLFCCatalogCatalog

DataSets info

Author.&Authen.

Job S

ub

mit

Even

t

Job

Qu

ery

Job

Stat

us

Input “sandbox”

Input “sandbox” + Broker InfoGlobus RSL

Output “sandbox”

Output “sandbox”

Job Status

Pu

blis

h

vom

s-pr

oxy-

init

Exp

and

ed J

DL

SE & CE info

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid Foundation: Security

• Authentication based on X.509 PKI infrastructure– Certificate Authorities (CA) issue (long lived) certificates

identifying individuals (much like a passport) Commonly used in web browsers to authenticate to sites

– Trust between CAs and sites is established (offline)– In order to reduce vulnerability, on the Grid user identification is

done by using (short lived) proxies of their certificates

• Proxies can– Be delegated to a service such that it can act on the user’s

behalf– Include additional attributes (like VO information via the VO

Membership Service VOMS)– Be stored in an external proxy store (MyProxy) – Be renewed (in case they are about to expire)

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

AuthN and AuthZ: pre-VOMS

• Authentication– User receives certificate

signed by CA– Connects to “UI” by ssh– Downloads certificate– Single logon to Grid – create

proxy - then Grid Security Infrastructure identifies user to other machines

• Authorisation– User joins Virtual Organisation– VO negotiates access to Grid

nodes and resources– Authorisation tested by CE

– gridmapfile maps user to local account

UI

AUP

VO mgr

Personal/once

VO database

grid-mapfileson Grid services

GSI

VO service

Daily update

CA1.

3.

2.

•CA: Certif. Authority•AUP: Acceptable Use Policy

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Evolution of VO management

Before VOMS

• User is authorised as a member of a single VO

• All VO members have same rights

• Gridmapfiles are updated by VO management software: map the user’s DN to a local account

• grid-proxy-init – derives proxy from certificate – the “single sign-on to the grid”

VOMS

• User can be in multiple VOs– Aggregate rights

• VO can have groups– Different rights for each

Different groups of experimentalists

– Nested groups• VO has roles

– Assigned to specific purposes E,g. system admin When assume this role

• Proxy certificate carries the additional attributes

• voms-proxy-init

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid foundation: Information Systems

• Generic Information Provider (GIP)

– Provides LDIF information about a grid service in accordance to the GLUE Schema

• BDII: Information system in gLite 3.0 (by LCG)– LDAP database that is updated by a process– More than one DBs is used separate read and write– A port forwarder is used internally to select the correct DB

2171LDAP

2172LDAP

2173LDAP

2170Port Fwd

Update DB&

Modify DB

2170Port Fwd

Swap DBs

GIP Provider

Config File

LDIF File

Plugin

Cache

•LDIF: LDAP Data Interchange Format

•LDAP: Lightweight Directory Access Protocol

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid foundation: Information Systems

• R-GMA: provides a uniform method to access and publish distributed information and monitoring data– Used for job and infrastructure monitoring in gLite 3.0– Working to add authorization

• Service Discovery:– Provides a standard set of methods for locating Grid services – Currently supports R-GMA, BDII and XML files as backends– Will add local cache of information– Used by some DM and WMS components in gLite 3.0

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid foundation: Computing Element

• Three flavours available now: LCG-CE (GT2 GRAM)

In production now but will be phased-out next year

gLite-CE (GSI-enabled Condor-C) Already deployed but still needs

thorough testing and tuning. Being done now

CREAM (WS-I based interface) Deployed on the JRA1 preview test-bed.

After a first testing phase will be certified and deployed together with the gLite-CE

Our contribution to the OGF-BES group for a standard WS-I based CE interface

CREAM and WMProxy demo at SC06!• BLAH is the interface to the local

resource manager (via plug-ins)– CREAM and gLite-CE– Information pass-through: pass

parameters to the LRMS to help job scheduling

WMS,Clients

LRMSWN

bdIIR-GMACEMon

ComputingElement

glexec +LCAS/

LCMAPSBLAH

Gri

dS

ite

InformationSystem

•JRA: Joint Research Activity•WS-I: Web Services Interoperability•CREAM: Computing Resource Execution

and Management •BES: Basic Execution Service

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid foundation: Accounting

• APEL (Accounting Processor for Event Logs) : Uses R-GMA (Relational Grid Monitoring Architecture) to propagate and display job accounting information for infrastructure monitoring– Reads LRMS log files provided by LCG-CE and BLAH– Preparing an update for gLite 3.0 to use the files from BLAH

• DGAS (Distributed Grid Accounting System): Collects, stores and transfers accounting data. Compliant with privacy requirements– Reads LRMS log files provided by LCG-CE and BLAH.– Stores information in a site database (HLR, Home Location Register)

and optionally in a central HLR. Access granted to user, site and VO administrators

– Not yet certified in gLite 3.0. Deployment plan: DGAS is in certification at INFN It will send records to the GOC via DGAS2APEL

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid foundation: Storage Element

• Storage Element– Common interface: SRMv1, migrating to SRM v2.2– Various implementation from LCG and other external projects

disk-based: DPM, dCache / tape-based: Castor, dCache

– Support for ACLs in DPM (in future in Castor and dCache) After the summer: synchronization of ACLs between SEs

– Common rfio library for Castor and DPM being added

• Posix-like file access:– Grid File Access Layer (GFAL) by LCG

Support for ACL in the SRM layer (currently in DPM only) Support for SRMv2 being added now

– gLite I/O Support for ACLs from the file catalog and interfaced to Hydra for data

encryption Not certified in gLite 3.0. To be dismissed when all functionalities will be

also available in GFAL.

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

High Level Services: Catalogues

• File Catalogs– LFC from LCG

In June: interface to POOL. In the summer: LFC replication and backup.

– Hydra: stores keys for data encryption Being interfaced to GFAL (done by July) Currently only one instance, but in future there will be 3 instances:

at least 2 need to be available for decryption. Not yet certified in gLite 3.0. Certification will start soon.

– AMGA Metadata Catalog: generic metadata catalogue Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed Not yet certified in gLite 3.0. Certification will start soon.

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

High Level Services: File transfer

• FTS: Reliable, scalable and customizable file transfer– Manages transfers through channels

mono-directional network pipes between two sites

– Web service interface– Automatic discovery of services– Support for different user and administrative

roles– Adding support for pre-staging and new proxy renewal schema– Support for SRMv2.2, delegation, VOMS-aware proxy renewal in certification

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

High Level Services: Workload mgmt.

• WMS helps the user accessing computing resources – Resource brokering, management of job input/output, ...

• LCG-RB: GT2 + Condor-G– To be replaced when the gLite WMS proves to be reliable

• gLite WMS: Web service (WMProxy) + Condor-G– Management of complex workflows (DAGs) and compound jobs

bulk submission and shared input sandboxes support for input files on different servers (scattered sandboxes)

– Support for shallow resubmission of jobs– Job File Perusal: file peeking during job execution– Supports collection of information from CEMon, BDII, R-GMA

and from DLI and StorageIndex data management interfaces– Support for parallel jobs (MPI) when the home dir is not shared– Deployed for the first time in gLite 3.0

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WMS/LB/UI and CE• New WMS deployed and thoroughly debugged

– CMS: 100 collections * 200 jobs/collection, 3 UIs, 33 CEs ~ 2.5 h to submit jobs

• 0.5 seconds/job ~ 17 hours to transfer jobs to a CE

• 3 seconds/job• 26K jobs/day

Negligible failure rate due to WMS

– Shallow resubmission failure rate drops to less than 1% with 3 resubmissions

• Stability problems– investigating also other deployment

scenarios to make it more robust

• gLite CE still to be tested and optimized

Done(Success) jobs after ith Submission

0

20

40

60

80

100

0 1 2 3 4 5 6

Number of Submission

(%)

ATLAS

CMS

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

High Level Services: Workflows

• Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs

• A Collection is a group of jobs with no dependencies– basically a collection of JDL’s

• A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters

• Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs – Submission time reduction

Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs

– Availability of both a single Job ID to manage the group as a whole and an ID for each single job in the group

nodeEnodeC

nodeA

nodeD

nodeB

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

High Level Services: Job Information

• Logging and Bookkeeping service– Tracks jobs during their lifetime (in terms of events)– LBProxy for fast access– L&B API and CLI to query jobs– Support for “CE reputability ranking“: maintains recent statistics of

job failures at CE’s and feeds back to WMS to aid planning

• Job Provenance: stores long term job information– Supports job rerun– If deployed will also

help unloading the L&B

– Not yet certified in gLite 3.0.

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Highlights: Job Priorities

• Applications ask for the possibility to diversify the access to fast/slow queues depending on the user role/group inside the VO

• GPBOX is a tool that provides the possibility to define, store and propagate fine-grained VO policies– based on VOMS groups and roles– enforcement of policies at sites: sites may accept/reject policies– Not yet certified. Certification will start when requested by the TCG.

• Current activities: test job prioritization without GPBOX: - Map VOMS groups to batch system shares - Publish info on the share in the CE GLUE 1.2 schema (VOView) - WMS match-making depending on submitter VOMS certificate - Settings are not dynamic (via e-mail or CE updates) - GIP available for Torque/Maui only. Working on the LSF one   - mainly a deployment issue

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 25

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Summary

• gLite 3 is

– is the next generation middleware for grid computing – developed according to a well defined process

controlled by the EGEE Technical Coordination Group

– deployed on the EGEE production infrastructure More than 200 sites

– development is continuing to provide increased robustness, usability, and functionality On the preview testbed

• CREAM, Job Provenance, glexec on the WNs, GPBOX

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Part IIThe EGEE Project

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The EGEE project• EGEE

– 1 April 2004 – 31 March 2006– 71 partners in 27 countries, federated in regional Grids

• EGEE-II– 1 April 2006 – 31 March 2008– 91 partners in 32 countries – 13 Federations

• Objectives– Large-scale, production-quality

infrastructure for e-Science – Attracting new resources and

users from industry as well asscience

– Improving and maintaining “gLite” Grid middleware

US partners in EGEE-II:• Univ. Chicago• Univ. South. California• Univ. Wisconsin• RENCI

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 28

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• Infrastructure operation– Currently includes sites across 39 countries– Continuous monitoring of grid services & automated site

configuration/management

• Middleware– Production quality middleware distributed under

business friendly open source licence

• User Support - Managed process from first contact through to production usage– Training– Expertise in grid-enabling applications– Online helpdesk– Networking events (User Forum, Conferences etc.)

• Interoperability– Expanding geographical reach and interoperability

with related infrastructures

Main lines of the EGEE project

TWGRID

KnowARC

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 29

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Applications on EGEE

• Applications from an increasingnumber of domains– Astrophysics– Computational Chemistry– Earth Sciences– Financial Simulation– Fusion– Geophysics– High Energy Physics– Life Sciences– Multimedia– Material Sciences– …

Book of abstracts: http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-005.pdf

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 30

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

EU projects related to EGEE

EUGRIDGRID

gLite @ OMII-Europe All-Hands meeting, Bologna, 12-13 February 2007 31

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Sustainability: Beyond EGEE-II

• Need to prepare for permanent Grid infrastructure– Ensure a reliable and adaptive support for all sciences– Independent of short project funding cycles– Infrastructure managed in collaboration

with national grid initiatives