infso-ri-508833 enabling grids for e-science grid interoperations cook book markus schulz, laurence...
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Grid InteroperationsCook Book
Markus Schulz, Laurence Field
EGEE SA3
CERN-IT-GD
Interoperation CookBook 3/29/07 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Overview
• Interoperability– Speaking the same language (or using a translator)– Middleware problem
• Interoperation– Using interoperating infrastructures – Need the above, but needs operational links
Interoperation CookBook 3/29/07 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Why
• Different production Grids have been established worldwide – Funding based on regions and application domains
• Grid infrastructures are based on different middleware – Often confused with the infrastructures (EGEE, gLite, LCG-xx….)– Historical fact
• Several user communities depend on these infrastructures– Main computing resource
Cycles and storage
– User communities span multiple regions and funding agencies
Interoperation CookBook 3/29/07 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Why is there diversity?
• The infrastructures outpaced standardization• We discovered how to do grid computing on the go
– First there where NO standards – Then there where standards that didn’t reflect experience – Then there where active users
Active users drive the infrastructures Mandate functional and performance evolution
• Standardization work is not a directly visible advantage• As is security…..
– Then users discovered other infrastructures And things got complicated
– Now usable standards start to emerge…. But infrastructures can only convert slowly
Interoperation CookBook 3/29/07 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Why is there diversity?
• This is not surprising:– Grid computing is about interoperability/interoperation.
Different Batch Systems Different Storage Systems Different Administrative domains
– Grid Middleware is the implementation of the abstract interface.– Grid Interoperability, abstracting the abstract interfaces
Interoperation CookBook 3/29/07 6
Enabling Grids for E-sciencE
INFSO-RI-508833
Hourglass Model
Site Specific Systems
VO/Grid Specific Middleware
MonitoringService Discovery
Job Submission
File Transfer
Security
•Popular: But tells only half of the story
•1/4 of cost
•Policies and operational procedures
•Support….
•Hidden assumptions
•Network access from WN nodes
Security & Availability Policy
UsageRules
Certification Authorities
AuditRequirements
Incident Response
User Registration & VO Management
Application Development& Network Admin Guide
VOSecurity
Interoperation CookBook 3/29/07 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Why Interoperation?
• Simple, because there is a clear need
• VOs started to interoperate on their own• “Keyhole” adapters emerged
– Minimal interfaces• Problematic
– Requirements change– Many adapters need to be maintained
1 per VO/Infrastructure Grids evolve at different pace
• Change, change ,change
– Workflow debugging becomes (almost) impossible Different error messages Small surface of the keyholes
– VOs can’t follow operations easily Different tools
– Many more synchronization work (meetings)
Interoperation CookBook 3/29/07 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Why Interoperation?
• Because it steers the development of standards into the right direction
• Linking the infrastructures – Helps to establish practical, usable standards
For the people, by the people
– Helps to establish working policies Sites and users
– Reflects different experiences– Keeps the middleware modular
Domain separation• DATA, JOBS, INFORATION…..
Interoperation CookBook 3/29/07 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Why EGEE?
DEISATeraGrid
•Related infrastructure projects
Interoperation CookBook 3/29/07 10
Enabling Grids for E-sciencE
INFSO-RI-508833
How?
• Understanding the differences – Compatibility matrix
• Domains that have to be linked for interoperability– Security– Information Services – Job Management– Data Management
• For interoperation you have to add– Monitoring– Accounting – Operational links and joint policies– Trouble ticket systems – Operational security
Interoperation CookBook 3/29/07 11
Enabling Grids for E-sciencE
INFSO-RI-508833
GIN
• OGF’s Grid Interoperability Now
• Six international teams met for the first time at GGF-16 in Feb 2006– Application Use Cases– Authentication/Identity Mgmt– Job Description Language– Data Location/Movement– Information Schemas– Testbeds
GIN
Interoperation CookBook 3/29/07 12
Enabling Grids for E-sciencE
INFSO-RI-508833
Interoperability Matrix
• Simple example:
ARC OSG EGEE
Job Submission GridFTP GRAM GRAM
Service Discovery LDAP/GIIS LDAP/GIIS LDAP/BDII
Schema ARC GLUE GLUE
Storage Transfer Protocol GridFTP GridFTP GridFTP
Storage Control Protocol SRM SRM SRM
Security GSI/VOMS GSI/VOMS GSI/VOMS
Interoperation CookBook 3/29/07 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Select Strategy
• For each domain select one strategy• Common Interfaces
– seems to be the most straightforward– absence of established standards
which interface do you choose grid infrastructures have heavily invested in one interface this is the ultimate goal, but will require a long time
• We have to provide services NOW Good standards are hard to get
• See SRM and GLUE discussions on semantic details
• Adapters and Translators– adapters and translators can be used in the higher level services
Condor approach • Adapters for condor, GRAM, ARC, UNICORE,…..
– Changes confined to higher level interfaces– Infrastructure “un-touched”
But environments and client libs. have to be adapted
– Good indicator for area that should be standardized Adapters and translators can only work if same functionality is provided
Interoperation CookBook 3/29/07 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Select Strategy
• Grid Gateways– Used when concepts are too different
Different security models are hard to “translate” No corresponding services
– Technically close to adapters and translators– Standalone, trusted service(s)– Scalability is problematic
All jobs go through one/a few gateways (bottleneck)
– Robustness can be an issue Russian doll (layered software)
– Only one step ahead of users “keyhole” adapters Short term solution Demonstrates potential of interoperability
– Gateways are indicators for different domains Maybe concepts have to be re-evaluated
Interoperation CookBook 3/29/07 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Example GIN-INFO
• Starting from bi-lateral work– EGEE / OSG already interoperating since Autumn 2005
Both are using an LDAP based information system Both are using the Glue schema ( different boot strapping)
• OSG site URLs generated from OSG GOC DB
• EGGE site URLs generated from EGEE GOC DB
– EGEE /NDGF Working on interoperability since Summer 2005 Both use an LDAP based information system
• But different schema Trying schema translation approach
Interoperation CookBook 3/29/07 16
Enabling Grids for E-sciencE
INFSO-RI-508833
Example GIN-INFO
• Naregi working on interoperability with EGEE since winter 2006– Naregi information system, “Cell domains”
Different Schema, (vendor extensions to CIM) CIM providers OGSA-DIA interface
– schema translation
• Teragrid– MDS4 information system with Glue schema version 1.1– Translator
• Pragma– WebSIM – Prototype translator finished
Interoperation CookBook 3/29/07 17
Enabling Grids for E-sciencE
INFSO-RI-508833
Initial Architecture
GIN
BDIIARC
BDII
EGEE
Site
OSG
Site
NDGF
Site
Naregi
Grid
Teragrid
Grid
Pragma
Grid
NDGF
BDII
EGEE
BDIIOSG
BDII
Naregi
BDII
Teragrid
BDIIPragma
BDII
Tra
nsla
tor
Tra
nsla
tor
Tra
nsla
tor
Tra
nsla
tor
Interoperation CookBook 3/29/07 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Current Architecture
Generic Information Provider
Pro
vide
r E
GE
E
Pro
vide
r O
SG
Pro
vide
r N
DG
F
GIN
BDIIARC
BDII
Pro
vide
r N
areg
i
Pro
vide
r T
erag
rid
Pro
vide
r P
ragm
a
EGEE
Site
OSG
Site
NDGF
Site
Naregi
Grid
Teragrid
Grid
Pragma
Grid
•Grids use GIN-BDII as the source
•See Naregi
Interoperation CookBook 3/29/07 19
Enabling Grids for E-sciencE
INFSO-RI-508833
Interoperating information systems
DEISA
EGEE
Naregi
Nordugrid
OSG
Pragma
Teragrid
http
://gr
idpo
rtal
-ws0
1.he
p.ph
.ic.a
c.uk
/gin
/gin
-loca
tions
.km
z
Interoperation CookBook 3/29/07 20
Enabling Grids for E-sciencE
INFSO-RI-508833
Performance Indicators
EGEE OSG NDGF Naregi Teragrid Pragma
Query LDAP LDAP LDAP OGSA-DAI
WSRF wget
Sites 195 8 37 1 1 17
Clusters 234 8 36 4 12 18
Real 20.3s 11.0s 16.8s 74.7s 24.4s 21.1s
User 15.1s 1.2s 2.3s 14.3s 21.9s 0.2s
System 4.9s 0.7s 1.5s 0.4s 3.1s 0.2s
Memory 0.9% 0.8% 0.9% 9.0% 7.3% 0.8%
PIII 1GHz, 256Mb
Interoperation CookBook 3/29/07 21
Enabling Grids for E-sciencE
INFSO-RI-508833
Complexity
• Syntax Interoperability Matrix(Naregi)
Grid Schema Data Query Lang
Client IF Software
Tera-Grid GLUE XML XPath WSRF RP Queries
MDS4
OSG GLUE LDIF LDAP LDAP BDII
NAREGI CIM 2.10+ext Relational SQL OGSA-DAIWS-I RUS
CIMOM + OGSA-DAI
EGEE/LCG
GLUE LDIF LDAP LDAP BDII
Relational SQL R-GMA i/f R-GMA
Nordu
GridARC LDIF LDAP LDAP GIIS
Interoperation CookBook 3/29/07 22
Enabling Grids for E-sciencE
INFSO-RI-508833
Summary Info Systems
• Information systems are very similar• Joining information systems is easy• Translating information is tricky
– Moving from one model to another is straight forward.– Showstopper if information doesn’t map, ie missing attributes
• We can live with different information systems– But we can’t live with different information
• How do we ensure good quality information?– Need to develop tests for the information
Based on the use cases• How can we ensure the coordinates are correct for a site?
– --------> grid operations!
Interoperation CookBook 3/29/07 23
Enabling Grids for E-sciencE
INFSO-RI-508833
Problematic Area
• Monitoring– Foundation of interoperation
• Many different tools – For each infrastructure more than one
• No well defined schema– Same attribute has different meanings – Security relevant information exposed
• Sites hate double monitoring
Interoperation CookBook 3/29/07 75
Enabling Grids for E-sciencE
INFSO-RI-508833
Globus MDS
NodeGRIS
Provider
SiteGIIS
Cache
GIIS
Interoperation CookBook 3/29/07 75
Enabling Grids for E-sciencE
INFSO-RI-508833
Globus MDS
NodeGRIS
Provider
SiteGIIS
Cache
GIIS
Interoperation CookBook 3/29/07 1
Enabling Grids for E-sciencE
INFSO-RI-508833
Site
GridIce
NodeLemon
Sensor
GridIce
GRIS
DB
LemonServer
Provider
Interoperation CookBook 3/29/07 1
Enabling Grids for E-sciencE
INFSO-RI-508833
Site
GridIce
NodeLemon
Sensor
GridIce
GRIS
DB
LemonServer
Provider
Interoperation CookBook 3/29/07 77
Enabling Grids for E-sciencE
INFSO-RI-508833
R-GMA
NodeR-GMA API
Sensor
Site
Cache
ProducerServlet
R-GMA
Interoperation CookBook 3/29/07 77
Enabling Grids for E-sciencE
INFSO-RI-508833
R-GMA
NodeR-GMA API
Sensor
Site
Cache
ProducerServlet
R-GMA
Interoperation CookBook 3/29/07 78
Enabling Grids for E-sciencE
INFSO-RI-508833
Site
ProducerServlet
DB
Cron
Apel
Node
Script
R-GMA
Interoperation CookBook 3/29/07 78
Enabling Grids for E-sciencE
INFSO-RI-508833
Site
ProducerServlet
DB
Cron
Apel
Node
Script
R-GMA
Interoperation CookBook 3/29/07 79
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid CatGrid Cat
GridCat
SiteGRAM
SQL
Cron
Scripts
Interoperation CookBook 3/29/07 79
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid CatGrid Cat
GridCat
SiteGRAM
SQL
Cron
Scripts
Interoperation CookBook 3/29/07 24
Enabling Grids for E-sciencE
INFSO-RI-508833
Interoperability Matrix
Sensors Local
Transport
Site Cache External
Transport
Schema Repository
MDS Information
Providers
LDAP Memory
LDAP DB
LDAP LDAP LDAP
R-GMA Various HTTP Memory
SQL
HTTP R-GMA MySQL
Apel Custom
Scripts
MySQL MySQL R-GMA
HTTP
R-GMA MySQL
GridICE Information
Providers
Lemon Lemmon
Server
MDS
LDAP
LDAP Postgresql
Grid Cat Various Cron SQL Lite GRAM GridCat Postgresql
MonaLisa Monitoring Module
??? SQL DBs ??? Java Objects
SQL DBs
Interoperation CookBook 3/29/07 26
Enabling Grids for E-sciencE
INFSO-RI-508833
What does it take?
• OSG– Very similar middleware– Close link via main users
• ARC (NDGF)• NAREGI• UNICORE (DEISA)• OGF-GIN
Interoperation CookBook 3/29/07 27
Enabling Grids for E-sciencE
INFSO-RI-508833
OSG
• November and December 2004 – Initial meeting with OSG to discuss interoperability
A common information schema was the key
– Proposal for version 1.2 of the Glue Schema was discussed Include new attributes required by OSG, Marco Mambelli
• January 2005– Proof of concept was tried, Leigh Grundhoefer (Indiana)
Installed Generic Information Provider (GIP) on an OSG CE OSG CE was configured to support the dteam VO “Hello world” job, submitted through the LCG RB and ran on an
OSG CE Installed the LCG clients available on OSG from a tarball
• Oliver Keeble (CERN) Submitted test job that did basic data management operations
Interoperation CookBook 3/29/07 28
Enabling Grids for E-sciencE
INFSO-RI-508833
OSG
• Modifications to the OSG and LCG software releases – Updated the GIP to publish version 1.2 of the Glue Schema
The GridFTP server on the OSG CE advertised as an LCG SE
– Automatically configure the GIP in the OSG release Information scavenger script, Shaowen Wang (Iowa)
• August 2005 (month of focussed activity)– Included first OSG sites into the LCG operational framework– Set up a BDII that represented these OSG sites– Included this BDII to the LCG information system– All OSG sites found in this BDII were automatically tested
Using the Site Functional Tests (SFT) framework
– Created a script to install the LCG clients on OSG CEs• November 2005
– First user jobs from GEANT4 arrived on OSG– GIP validator for OSG operations. Shaowen Wang (Iowa)
Interoperation CookBook 3/29/07 29
Enabling Grids for E-sciencE
INFSO-RI-508833
OSG
• March 2006, Operations Progress– Information system bootstrapping.
Dynamic web page from OSG GOC DB.
– Routing of trouble tickets. – Joint operations VO
For running tests. Deployment of client libraries.
– OSG joined the Monday WLCG operations meeting to report on WLCG issues
• Summer 2006– CMS successfully taking advantage interoperations
Without being aware of it
Interoperation CookBook 3/29/07 30
Enabling Grids for E-sciencE
INFSO-RI-508833
Summary EGEE OSG
• How to maintain interoperation?– Grids evolve – Different release cycles
• Testbed for interoperation is needed NOW
• Interoperability took 6 month – Technical a simple use case
• Interoperation took 6 month