datagrid is a project funded by the european union chep 2003 24-28 march 2003 r-gma 1 r-gma: first...

17
DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) [email protected] https:// edms . cern . ch /document/376535/

Upload: sharyl-nicholson

Post on 30-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1

R-GMA: First results after deployment

Steve Fisher (EDG - WP3)

[email protected]

https://edms.cern.ch/document/376535/

Page 2: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 2

Who we are Heriot-Watt, Edinburgh

Andrew Cooke, Werner Nutt

IBM-UK James Magowan, (Manfred Oevers), Paul Taylor

INFN Roberto Barbera, Giuseppe Save, Gennaro

Tortone

Queen Mary, University of London Roney Cordenonsi, (Ari Datta)

CCLRC Linda Cornwall, Abdeslem Djaoui, Steve Fisher,

Robin Middleton

PPARC Rob Byrom, Laurence Field, Steve Hicks,

Manish Soni, Antony Wilson, (Xiaomei Zhu), Jason Leake

SZTAKI, Hungary Peter Kacsuk, Norbert Podhorszki

Trinity College Dublin Brian Coghlan, Stuart Kenny, David

O’Callaghan, (John Ryan)

Page 3: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 3

R-GMA

Uses the Grid Monitoring Architecture from Global Grid Forum

R-GMA is a relational implementation

Applied to both information and monitoring

Creates impression that you have one RDBMS per Virtual Organisation

Producer

Consumer

Registry

Information flow

Meta-data flow

Page 4: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 4

Relational Approach

Not a general distributed RDBMS system, but a way to use the relational model in a distributed environment where global consistency is not important.

Producers announce: SQL “CREATE TABLE” publish: SQL “INSERT”

Consumers collect: SQL “SELECT”

Some producers, the Registry and Schema make use of RDBMS as appropriate – but what is central is the relational model.

Page 5: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 5

Producers DataBaseProducer – Supports History Queries

Information not lost Supports joins Clean up strategy

StreamProducer – Supports Continuous Queries In memory data structure Can define minimum retention period

ResilientStreamProducer – Supports Continuous Queries Like the StreamProducer but won’t lose data if system crashes So slightly slower

LatestProducer – Supports Latest Queries Just holds the latest information for any “primaryish” key Supports joins

CanonicalProducer – Supports anything Offers anything as relations

Page 6: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 6

Archiver (Re-publisher)

It is a combined Consumer-Producer

You just have to tell it what to collect and it does so on your behalf

Re-publishes to any kind of “Insertable” (i.e. not to the CanonicalProducer)

Page 7: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 7

Schema & ContributionsCPULoad (Global Schema)

Country Site Facility Load Timestamp

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

CH CERN ALICE 0.9 19055611022002

CH CERN CDF 0.6 19055511022002

CPULoad (Producer 3)

CH CERN ATLAS 1.6 19055611022002

CH CERN CDF 0.6 19055511022002

CPULoad (Producer 1)

UK RAL CDF 0.3 19055711022002

UK RAL ATLAS 1.6 19055611022002

CPULoad (Producer 2)

UK GLA CDF 0.4 19055811022002

UK GLA ALICE 0.5 19055611022002

Page 8: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 8

The Mediator

Producers, associated with views on a virtual data base.

Queries posed against the virtual data base

The Mediator must: find the right Producers

combine information from them

Can now merge information from several producers

The final mediator will take “any” SQL statement and do the right thing

Page 9: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 9

R-GMA Tools

R-GMA CLI Command Line Interface (similar to MySQL)

Supports single query and interactive modes

R-GMA Browser JSP application dynamically generating web pages

Supports pre-defined and user-defined queries

Pulse R-GMA Java client-based GUI

Supports streaming and simple graphical displays

Page 10: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 10

A user application: CMS

BOSS for job tracking on local farm It currently forks the executable and parses stdout to publish info

directly to an SQL DB

They publish to one table per job type and one table which is common to all job types

They are now ready to publish via R-GMA instead Providing a scaleable Grid solution

Page 11: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 11

GIN and GOUT (Gadget IN and Gadget OUT)

R-GMA Consumers

LDAPInfoProvider

GIN

LDAPServer

LDAPInfoProvider

CircularBuffer Producer

GIN

Consumer (CE)

Consumer (SE)

Consumer (SiteInfo) RDBMS

DataBase Producer

GOUT

ConsumerAPI

Archiver

CircularBuffer Producer

R-GMA

Page 12: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 12

CE and SE Tables

ComputingElement

dnCEIdTotalCPUsFreeCPUsTotalJobsRunningJobs……

CloseStorageElement

dnCEIdCloseSE……

StorageElementstatus

dnSEIdSEfreespace……

“Select a ComputingElement with at least 1 free CPU that also has a CloseStorageElement with at least 1000 MB of free space”

SELECT DISTINCT ComputingElement.CEId FROM

ComputingElement, CloseStorageElement,StorageElementStatus WHERE

ComputingElement.FreeCPUs > 0 AND

(ComputingElement.CEId = CloseStorageElement.CEId AND

CloseStorageElement.CloseSE = StorageElementStatus.SEId AND

StorageElementStatus.SEfreespace > 1000)

Page 13: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 13

All Grid Services

OGSA Factories, GSH, GSR

Registry includes HandleMapper

SQL as Service Data Element Query Language

ConsumerFactory

ProducerInstance

OGSIfied R-GMA

Sensor

ProducerAPI

Application

ConsumerAPI

Schema

RegistryConsumerInstance

ProducerFactory

Page 14: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 14

Other technicalities – no time today

Soft-state Registration and the Registry Registry records existence of Producers and Consumers

Registry holds last contact time and ‘expiry’ time

Producers and Consumers periodically refresh their time stamps

Scheduled removal of entries that have timed-out

Registry & schema distribution Will have one logical registry and schema per VO

Each logical registry will have multiple physical “copies”

Self healing algorithm

Security

etc …

Page 15: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 15

Performance

By design: Very flexible - to avoid bottlenecks

Powerful queries allow a single query to be made

Performance and Optimisation Use NetLogger and profiling tools to identify possible bottlenecks

Page 16: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 16

Results

It has only just been deployed in the EDG development testbed and we do not yet have the results which the title of this talk implied.

Page 17: DataGrid is a project funded by the European Union CHEP 2003 24-28 March 2003 R-GMA 1 R-GMA: First results after deployment Steve Fisher (EDG - WP3) s.m.fisher@rl.ac.uk

CHEP 2003 24-28 March 2003 R-GMA 17

Summary and the future

R-GMA is a combined Grid information and monitoring system

Just deployed in the EDG development testbed

Focusing on reliability, stability and performance for the rest of the project (9 months)

Thanks to the EU and our national funding agencies for their support of this work