infso-ri-508833 enabling grids for e-science information system valeria ardizzone infn singapore,...
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Information SystemValeria Ardizzone
INFN
Singapore, 1st South East Asia Forum -- EGEE tutorial
9-10 January 2006
EGEE Tutorial, Singapore, 09.02.2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-infosites&
lcg-info
EGEE Tutorial, Singapore, 09.02.2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
How to discover resources ?
• Once an user is logged into an User Interface (s)he is ready to take advantage of the Grid Power for his/her own application.
• But what are the available resources to accomplish his/her tasks?
• The answer to this question comes through the interactions with the Information System (IS).
• The Information System (IS) provides information about the LCG-2 Grid resources and their status.
EGEE Tutorial, Singapore, 09.02.2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
How to discover resources (cont)
• The data published in the IS conforms to the GLUE (Grid Laboratory for a Uniform Environment) Schema. The GLUE Schema aims to define a common conceptual data model to be used for Grid resources.
• In LCG-2, the BDII (Berkeley DB Information Index), based on an updated version of the Monitoring and Discovery Service (MDS), was adopted as main provider of the Information Service.
• In gLite, R-GMA (Relational Grid Monitoring Architecture) is adopted as IS.
EGEE Tutorial, Singapore, 09.02.2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Monitoring and Discovery Service
• Computing and storage resources at a site implement an entity called Information Provider, which generates the relevant information of the resource (e.g.: the used space in a SE).
• This information is published via an LDAP server by the Grid Resource Information Servers, or GRISes.
EGEE Tutorial, Singapore, 09.02.2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
• In each site an element called the Site Grid Index Information Server (GIIS) collects all the information of the different GRISes and publishes it.
• This BDII queries the GIISes and acts as a cache, storing information about the Grid status in its database.
Monitoring and Discovery Service
EGEE Tutorial, Singapore, 09.02.2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
• Querying the BDII a user or a service has all the available information about the status of the grid resources.
• Moreover in order to get more up-to-date information it is possible to querying directly the GIISes or GRISes.
Monitoring and Discovery Service
EGEE Tutorial, Singapore, 09.02.2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
How to query the IS?
• In order to query directly the IS elements two higher level tools are provided.
lcg-infosites
lcg-info
• These tools should be enough for most common user needs and will usually avoid the necessary of raw LDAP queries.
EGEE Tutorial, Singapore, 09.02.2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-infosites
• The lcg-infosites command can be used as an easy way to retrieve information on Grid resources for the most use cases.
USAGE: lcg-infosites --vo <vo name> options -v <verbose level> --is <BDII to query>
EGEE Tutorial, Singapore, 09.02.2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-infosites options
EGEE Tutorial, Singapore, 09.02.2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-info intro
• This command can be used to list either CEs or the SEs that satisfy a given set of conditions, and to print the values of a given set of attributes.
• The information is taken from the BDII specified by the LCG_GFAL_INFOSYS environment variable.
• The query syntax is like this: attr1 op1 valueN, ...
attrN opN valueN
where attrN is an attribute name op is =, >= or <=, and the cuts are ANDed. The cuts are comma-separated and spaces are
not allowed.
After the upgrading of the new
GLUE SCHEMA it’s not possible
use the operator ‘>’ and ‘<‘
After the upgrading of the new
GLUE SCHEMA it’s not possible
use the operator ‘>’ and ‘<‘
EGEE Tutorial, Singapore, 09.02.2006 16
Enabling Grids for E-sciencE
INFSO-RI-508833
USAGE
lcg-info --list-ce [--bdii bdii] [--vo vo] [--sed] [--query query] [--attrs list]
lcg-info --list-se [--bdii bdii] [--vo vo] [--sed] [--query query] [--attrs list]
lcg-info --list-attrs
lcg-info --help
lcg-info usage
EGEE Tutorial, Singapore, 09.02.2006 17
Enabling Grids for E-sciencE
INFSO-RI-508833
lcg-info options
EGEE Tutorial, Singapore, 09.02.2006 24
Enabling Grids for E-sciencE
INFSO-RI-508833
References
LCG-2 User Guide Manual Series https://edms.cern.ch/file/454439/LCG-2-
UserGuide.html
EGEE Tutorial, Singapore, 09.02.2006 25
Enabling Grids for E-sciencE
INFSO-RI-508833
SECOND PART
R-GMA
EGEE Tutorial, Singapore, 09.02.2006 26
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
Introduction to R-GMA and Grid Monitoring Architecture (GMA).
R-GMA within Testbed
R-GMA in depth:- Schema, Registry, Producer and Consumer- Query and Storage Types- R-GMA Browser
EGEE Tutorial, Singapore, 09.02.2006 27
Enabling Grids for E-sciencE
INFSO-RI-508833
Introduction to R-GMA
• Relational Grid Monitoring Architecture (R-GMA)– Developed as part of the EuropeanDataGrid Project (EDG)
– Now as part of the EGEE project.
– Based the Grid Monitoring Architecture (GMA) from the Global Grid Forum (GGF).
• Uses a relational data model.– Data are viewed as tables.
– Data structure defined by the columns.
– Each entry is a row (tuple).
– Queried using Structured Query Language (SQL).
EGEE Tutorial, Singapore, 09.02.2006 28
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Monitoring Architecture(GMA)
PRODUCER
CONSUMER
REGISTRY
Store location
Lookup location
Transfer Data
• The Producer stores its location (URL) in the Registry.
• The Consumer looks up producer URLs in the Registry.
• The Consumer contacts the Producer to get all the data or the Consumer can listen to the Producer for new data.
EGEE Tutorial, Singapore, 09.02.2006 29
Enabling Grids for E-sciencE
INFSO-RI-508833
R-GMA within Testbed
EGEE Tutorial, Singapore, 09.02.2006 30
Enabling Grids for E-sciencE
INFSO-RI-508833
R-GMA: Schema-Registry-Mediator
VIRTUAL DATABASE
TABLE 1, Colum defs
TABLE 2, Colum defs
TABLE 3, Colum defs
TABLE 4, Colum defs
SCHEMA
TABLE 1,Producer P1 details
TABLE 2,Producer P1 details
TABLE 2,Producer P2 details
TABLE 2,Producer P3 details
TABLE 3,Producer P2 details
TABLE 3,Producer P1 details
TABLE 3,Producer P3 details
REGISTRYMEDIATOR
R-GMA Server
MEDIATOR: a set of rules for deciding which data providers to contact for any given query.
REGISTRY: It holds the details of all producers that are publishing to tables in the virtual database and it also holds the details of “continuous” consumers.
SCHEMA : it holds the names and definitions of all of the tables in the virtual database, and their authorization rules.
EGEE Tutorial, Singapore, 09.02.2006 31
Enabling Grids for E-sciencE
INFSO-RI-508833
R-GMA: Producer-Consumer
VIRTUAL DATABASE
TABLE 1, Colum defs
TABLE 2, Colum defs
TABLE 3, Colum defs
TABLE 4, Colum defs
SCHEMA
TABLE 1,Producer P1 details
TABLE 2,Producer P1 details
TABLE 2,Producer P2 details
TABLE 2,Producer P3 details
TABLE 3,Producer P2 details
TABLE 3,Producer P1 details
TABLE 3,Producer P3 details
REGISTRY
MEDIATOR
P1
P2
P3
C1
C2
SQL “INSERT”
SQL “SELECT”
Producers: are the data providers for the virtual database. Writing data into the virtual database is known as publishing, and data is always published in complete rows, known as tuples. There are three types of producer: Primary, Secondary and On-demand.
Consumer: represents a single SQL SELECT query on the virtual database. The query is matched against the list of available producers in the Registry. The consumer service then selects the best set of producers to contact and sends the query directly to each of them, to obtain the answer tuples.
R-GMA Server
EGEE Tutorial, Singapore, 09.02.2006 32
Enabling Grids for E-sciencE
INFSO-RI-508833
Query and Storage Types
• Continuous: as soon as new data becomes available it is broadcast to all interested parties.
• Latest: correspond to intuitive idea of current information.
• History: return time sequenced data.TABLE 1,Producer P1 details
TABLE 2,Producer P1 details
TABLE 2,Producer P2 details
TABLE 2,Producer P3 details
TABLE 3,Producer P2 details
TABLE 3,Producer P1 details
TABLE 3,Producer P3 details
REGISTRY
P1
Latest-store
Continuous&History-store
P1
LATEST RETENTION PERIOD (LRP) and
HISTORY RETENTION PERIOD (RTP)
allow producers to periodically purge old tuples, and to give a precise meaning to the “current state”.
Tuple-store can be in Memory or Database
EGEE Tutorial, Singapore, 09.02.2006 33
Enabling Grids for E-sciencE
INFSO-RI-508833
Producer Types
• Primary Producer
• Secondary Producer
• On-Demand Producer
User Code
Producer API
Producer Service
Tuple Storage
CControl only
Queries
Tuples
SELECT * Tuples
P
User Code
Producer API
Producer Service CControl only
Queries
Tuples
TuplesQueries
User Code
User Code
Producer API
Producer Service
Tuple Storage
CControl and inserted tuples
Queries
Tuples
EGEE Tutorial, Singapore, 09.02.2006 34
Enabling Grids for E-sciencE
INFSO-RI-508833
Continuous
Producer Servlet
Registry
Store location
Lookup
locatio
n
Continuous
Store table description
Producer API
SQL “CREATE TABLE”
Result Set
TableName
Value 1 Value 2
TableName URL Predicate
SchemaTableName Column
TableName
Value 1 Value 2
Insert
TableName
UK RAL Alice
Consumer ServletConsumer API
SQL “SELECT”TableName
Value 1 Value 2TableName
Value 1 Value 2
Query
SQL “INSERT”
EGEE Tutorial, Singapore, 09.02.2006 35
Enabling Grids for E-sciencE
INFSO-RI-508833
History or Latest
Producer Servlet
Registry
Store location
Lookup
locatio
n
Query
Store table description
Producer API
SQL “CREATE TABLE”
Result Set
TableName
Value 1 Value 2
TableName URL Predicate
SchemaTableName Column
TableName
Value 1 Value 2
Insert
TableName
UK RAL Alice
Consumer ServletConsumer API
SQL “SELECT”TableName
Value 1 Value 2TableName
Value 1 Value 2
Query
SQL “INSERT”
EGEE Tutorial, Singapore, 09.02.2006 36
Enabling Grids for E-sciencE
INFSO-RI-508833
R-GMA APIs
• APIs exist in Java, C, C++, Python. – For clients (servlets contacted behind the scenes)
• They include methods for…– Creating consumers
– Creating primary and secondary producers
– Setting type of queries, type of produces, retention periods, time outs…
– Retrieving tuples, inserting data
– …
• You can create your own Producer or Consumer.
EGEE Tutorial, Singapore, 09.02.2006 38
Enabling Grids for E-sciencE
INFSO-RI-508833
https://rgmasrv.ct.infn.it:8443/R-GMA
EGEE Tutorial, Singapore, 09.02.2006 39
Enabling Grids for E-sciencE
INFSO-RI-508833
Security: Requirements
R-GMA
• Consumer users: who requests information.
• Producer users: who provides information.
• Site administrators: who runs R-GMA services.
• Virtual Organizations: who “owns” the schema and registry.
Consumer
Provider
Site Admin
VO
EGEE Tutorial, Singapore, 09.02.2006 40
Enabling Grids for E-sciencE
INFSO-RI-508833
Security: Solution
R-GMAConsumer
Provider
Site Admin
VO
• Mutual Autentication: guaranteeing who is at each end of an exchange of messages.
• Encryption: using an encrypted transport protocol (HTTPS).
• Authorization: implicit or explicit.
EGEE Tutorial, Singapore, 09.02.2006 41
Enabling Grids for E-sciencE
INFSO-RI-508833
More information
• R-GMA overview page.– http://www.r-gma.org/
• R-GMA documentation in EGEE– http://hepunx.rl.ac.uk/egee/jra1-uk/
• R-GMA in GILDA– https://rgmasrv.ct.infn.it:8443/R-GMA