global accounting in the grid and cloud

46
www.egi.eu EGI-InSPIRE RI-261323 EGI- InSPIRE www.egi.eu EGI-InSPIRE RI-261323 Global Accounting in the Grid and Cloud John Gordon, STFC HEPiX, Beijing 18 th October 2012

Upload: kaloni

Post on 24-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Global Accounting in the Grid and Cloud. John Gordon, STFC HEPiX , Beijing 18 th October 2012. Outline. History Yesterday Today Tomorrow New types of Accounting Record Future. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

Global Accounting in the Grid and Cloud

John Gordon, STFCHEPiX, Beijing

18th October 2012

Page 2: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Outline

• History• Yesterday• Today• Tomorrow• New types of Accounting Record• Future

Page 3: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Overview

• The APEL accounting system has been gathering cpu accounting records for the LHC experiments from around the world since 2004

• It now contains data from 2x10**9 jobs from 350 sites in 50 countries.

• Work is under way to extend accounting to storage and cloud

Page 4: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

History

Page 5: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

History

• LCG->EGEE->WLCG->EGI+EMI• Including along the way:

– Gratia (OpenScience Grid)– DGAS (Italy)– SGAS (NorduGrid)

Page 6: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

LCG• In the beginning, there was Les Robertson.

– Trying to form a Grid for LHC Computing from the plethora of national and international grid projects which had sprung up around the world

• The LHC Tier model* predated Grids but Grids offered technology on which to build the LHC distributed computing model(s).

• Getting working middleware was a problem but not the only one. There was also a need for a number of other operational services required for a working Grid.

• Les persuaded Tier1s to take responsibility for defining and developing various missing operational components– Karlsruhe – helpdesk– Lyon – operations portal– CERN – monitoring– RAL – accounting

* MONARCH Report

Page 7: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

The Start of APEL

• Dave Kant designed and wrote parsers for a few batch systems to gather usage data from the batch logs and the user grid identity information from Globus GRAM and then the LCG-CE.

• The APEL publishers combined this into a usage record for each job and sent it via R-GMA to RAL where they were processed, summarised, and visualised

• So the first sites started to publish

Page 8: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323Dave Kant - 8

By December 2004, 15 CEs at 13 sites

15-Dec-04

Page 9: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EGEE• APEL accounting

became a core service in EGEE (2004-2010) and the client was rolled out to more sites

• CESGA took over running and developing the portal

Page 10: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Others

• Driven by WLCG, data from other accounting systems were incorporated, – DGAS – Italy– Gratia – Open Science Grid– SGAS – Nordugrid

• And EGEE extended beyond Europe to Asia Pacific, India,

Page 11: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

WLCG

• EGEE pushed all sites to publish but functionality was driven by WLCG.

• The central APEL repository stores data by site

• The Portal pulls in topology to drive reports

• Tier1 Reports• Tier2 federations formally defined• Comparison with Pledges. (Tier2 Report)

Page 12: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

APEL 2011-12

• APEL collects cpu usage data from 274 sites by the APEL client software and from a further 90 where data is collected by other software and published to the central APEL repository.

• It is thus a single worldwide point of reference for all accounting data for a range of VOs(LHC and other international VOs as well as regional and national VOs).

• These non-APEL sites include alternative middleware stacks(ARC) within EGI and e-infrastructures within EGI (Italy, NorduGrid) and outside (Open Science Grid in US).

• The data from systems/services not using the APEL Client were doing direct database inserts.

• >2x10**9 jobs. 61% ATLAS• Reached 73Mjobs/month in 2012• 4M HS06Years almost 50% in the last 12 months.

Page 13: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 14: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 15: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 16: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Visualisation

• The Accounting Portal at CESGA• Pulls data from APEL, applies various

topologies (T1, T2, countries, NGIs)• Allows dynamic queries at any point in a

tree showing one variable as a function of two dimensions– Variables: njobs, cpu, wallclock, normcpu,

normwall, cpueff– Dimensions: month, site/region, VO,

Page 17: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 18: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-26132318

Page 19: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-26132319

5

Page 20: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 21: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 22: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

Internals

Page 23: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

APEL – 2011

APEL client

APEL client

APEL client

BROKER

ActiveMQ

EGI Accounting Portal

CONSUMERhttp://goc-accounting.grid-support.ac.uk

Externalclients

MySQL

MySQL

MySQLSummaries

created

Page 24: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

APEL 2012- Deployment Stage 1

APEL client BROKER

ActiveMQ

CONSUMER

MySQL

MySQLhttp://goc-accounting.grid-support.ac.uk External

clientsMySQL

EGI Accounting Portal

Externalclients

MySQL

CPU JobRecordsCPU Summaries

EGI Message Brokers

Record loader

Summaries created

Receiving SSM

Sending SSM

Page 25: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

APEL 2012- Deployment Stage 2

APEL client BROKER

ActiveMQ

CONSUMER

MySQL

EGI Accounting Portal

Externalclients

EGI Message Brokers

EGI Message Brokers

http://goc-accounting.grid-support.ac.uk

MySQL

CPU JobRecordsCPU Summaries

Record loader

DBunloader

Converted JobRecords

Receiving SSMSending SSM

New APEL client

Page 26: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

APEL – 2013

New APEL client

EGI Accounting Portal

http://goc-accounting.grid-support.ac.uk

Externalclients

MySQL

CPU JobRecordsCPU Summaries

EGI Message Brokers

EGI Message Brokers

Receiving SSM

Sending SSM

Record loader

DBunloader

New APEL client

nagios pub/sync

Page 27: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

What is SSM?

• Secure Stomp Messenger

• Simple program to send and receive messages

• Independent of message content

• Uses STOMP (and EGI brokers)

• ~1k lines of python code

SSM Status

• Deployed in production for ~ 10 months

• No operational difficulties– (except for an LDAP query

bug)• Robust, fast• Handles load easily• Allows separation of

messaging and server:– filesystem buffer– but no message checking

Page 28: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Accounting and SSM

• SSM can transport any data• We use it for accounting• Production:

– APEL message format• Testing:

– EMI Compute Accounting Record (CAR)– EMI Storage Accounting Record (StAR)

Page 29: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

SSM 2.0

• SSM 1 works well• But we are changing

it• Why?• It’s a bit over-

complicated– Difficult to develop

against– Could be simpler

• Interoperability issues:– Crypto not well defined– Synchronous messaging– Unnecessary (?)

message sequence• EMI has different

accounting systems which need to interoperate

• One python program is not good enough.

Page 30: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EMI Messaging Protocol for Accounting (EMPA)

• More logical, simpler• Use persistent queues

– Not replies• Use SSL

– Instead of encryption– (encryption still an option)

• Use the infrastructure

Page 31: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

New Publishers

• A number of other accounting clients and systems are in the process of publishing to APEL

• All they need to do is extract the relevant jobs from their database, (combine into a summary), transform to the UR format and write to a local directory.

• SSM does the rest, shipping the files to APEL.

Page 32: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Publishers

• Emi-apel (bulk of EGI sites)• DGAS (Italy)• Open Science Grid (USA, others)• SGAS (NorduGrid, Switzerland, Finland)• Individual Sites (CERN, NIKHEF, CC-IN2P3)• Unicore (Poland, Germany)• Globus/IGE (Germany, ??) • ARC/JURA (Hungary and other ARC)• PRACE (many but only selected VOs)• EDGI – Desktop Grids• MAPPER - Multiscale Simulation (PRACE+EGI)

Page 33: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Extend EGI Infrastructure

APEL Repository

SSM/ActiveMQ Messaging

APEL

DGAS

ARC/SGAS

OSG/Gratia

Sites

Unicore Accounting (PL)

Extract VOs Publish SummariesUR

Unicore

Unicoreparser Buil UR

GridSafe

RUS Client Publish SummariyUR

IGE

Unicoreparser Buil UR

Publishing directlyPublish

SummariyUR

ARC/JURA

ArcCE

Grid-SAFE

PRACE

Selected VOs

MAPPER, EDGI

Page 34: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

New Regionalised Structure

Central APEL

NGI APEL

Emi-apel

Emi-apel

Emi-apel

Emi-apel

ARCCE

NGI SGAS

•Regionalised APEL Server, collects job records from NGI sites•Send summaries on to central APEL•Visualisation Portal

PortalOtherRepo

Page 35: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

New Accounting Records

Page 36: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Readiness to Receive New Records

• The APEL repository is in the middle of a migration but the capability to receive usage records over SSM is already in production for both Job Records and Summary Job Records. It is being used by Open Science Grid and CERN, with a number of other publishers in migration.

• Consumers are in place on the test service for CAR and StAR.

• Adding others is a lightweight process. Much simpler than defining the record schema.

Page 37: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

APEL – 2013

New APEL client

EGI Accounting Portal

http://goc-accounting.grid-support.ac.uk

Externalclients

MySQLCPU JobRecordsCPU SummariesStorageRecords

StorageSummaries

EGI Message Brokers

EGI Message Brokers

Receiving SSM

Sending SSM

Record loader

DBunloader

New APEL client

Storageclients

Storage loader

Page 38: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

CAR

• CAR is an EMI revision of the OGF UR v1.0 rationalising some issues and including some common extensions deployed in a number of implementations.

• Not UR 2.0 which is a bigger revision started by OGF UR-WG

• XML document• CAR v1.0 was agreed but now some issues

have arisen and CAR v1.0.1 is being finalised• APEL new schema incorporates CAR v1.0

Page 39: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Storage

• StAR (Storage Accounting Record) is a version of the OGF UR 1.0 adapted for accounting of storage utilisation

• Developed by EMI, submitted as a public document to OGF, revised in the light of public comments. – Added Sitename, allocated space, and revised time

definitions.• Status – being implemented by EMI storage

providers (dCache, StoRM, DPM) in EMI-3

Page 40: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Storage Solutions• dCache (EMI)• StoRM (EMI)• DPM (EMI)• Castor• EOS• BestMan• Gratia (already collecting)• xrootd• hadoop• Gstat• Cloud Storage• (Unicore)• (ARC)• iRods• ?????

• The middleware in scope for EGI and WLCG

• Anything that collects storage information could also publish usage records

• eg Julia Andreeva talk at CHEP

Page 41: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Cloud• The EGI Federated Cloud Task Force has adopted a VM UR based on OGF UR

– Many values like memory and network were defined in the UR but seldom if ever implemented.• Sensors to collect the information and construct usage records were written.

– OpenNebula and OpenStack so far.• SSM is used to send to RAL.

Page 42: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

Future

Page 43: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

• APEL continues to collect more cpu accounting data for a widening ranges of infrastructures, projects, and VOs

• A more distributed architecture should reduce bottlenecks and give more control to the NGIs

• New types of accounting record will be added (Storage, Cloud, ...)

Page 44: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 45: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323

Page 46: Global Accounting in the Grid and Cloud

www.egi.euEGI-InSPIRE RI-261323