infso-ri-508833 enabling grids for e-science [email protected] egee what it is … … and what it...

63
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org [email protected] EGEE What it is … … and what it can do for you Ian Bird IT Department CERN, Switzerland EGEE Operations Manager Discussions with ESRF/ILL/EMBL Grenoble, 19 th October 2005

Post on 21-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

[email protected]

EGEEWhat it is …… and what it can do for you

Ian BirdIT Department

CERN, Switzerland

EGEE Operations Manager

Discussions with ESRF/ILL/EMBL

Grenoble, 19th October 2005

ESRF, ILL, EMBL; Grenoble, October 19, 2005 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Outline

• The EGEE project– Overview of the project– Applications deployed– Phase 1 Phase 2

• Operations– Release and Deployment– Operations – Support

• Middleware distribution– Software stack

• Site installation process– Grid-enabling a site

• Joining the grid – … and sharing resources

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

[email protected]

The EGEE Project

ESRF, ILL, EMBL; Grenoble, October 19, 2005 4

Enabling Grids for E-sciencE

INFSO-RI-508833

The largest e-Infrastructure: EGEE

• Objectives– consistent, robust and secure

service grid infrastructure– improving and maintaining the

middleware– attracting new resources and users

from industry as well as science

• Structure – 71 leading institutions in 27

countries, federated in regional Grids

– leveraging national and regional grid activities worldwide

– funded by the EU with ~32 M Euros for first 2 years starting 1st April 2004

ESRF, ILL, EMBL; Grenoble, October 19, 2005 5

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Activities

• 48 % service activities (Grid Operations, Support and Management, Network Resource Provision)

• 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development)

• 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation)

Emphasis in EGEE is on operating a productiongrid and supporting the end-users

ESRF, ILL, EMBL; Grenoble, October 19, 2005 6

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Activities

• 48 % service activities (Grid Operations, Support and Management, Network Resource Provision)

• 24 % middleware re-engineering 24 % middleware re-engineering ((Quality Assurance, Security, Network Quality Assurance, Security, Network Services DevelopmentServices Development))

• 28 % networking (Management, 28 % networking (Management, Dissemination and Outreach, User Dissemination and Outreach, User Training and Education, Application Training and Education, Application Identification and Support, Policy and Identification and Support, Policy and International Cooperation)International Cooperation)

Emphasis in EGEE is on operating a productiongrid and supporting the end-users

country sites country sites country sites Austria 2 India 1 Russia 10 Belgium 1 Israel 2 Singapore 1 Bulgaria 4 Italy 25 Slovakia 3 Canada 6 Japan 1 Slovenia 1 China 1 Korea 1 Spain 13 Croatia 1 Netherlands 2 Sweden 2 Cyprus 1 Macedonia 1 Switzerland 2 Czech Republic 2 Pakistan 2 Taiwan 4 France 8 Poland 4 Turkey 1 Germany 8 Portugal 1 UK &Ireland 35 Greece 6 Puerto Rico 1 USA 3 Hungary 1 Romania 1 Yugoslavia 1

EGEE/LCG-2 grid: 160 sites, 36 countries >15,000 processors, ~5 PB storageOther national & regional grids: ~60 sites, ~6,000 processors

Country providing resources Country anticipating joining

EGEE/LCG-2 Grid Sites : September 2005

ESRF, ILL, EMBL; Grenoble, October 19, 2005 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Operations Structure• Operations Management Centre

(OMC):– At CERN – coordination etc

• Core Infrastructure Centres (CIC)– Manage daily grid operations –

oversight, troubleshooting– Run essential infrastructure services– Provide 2nd level support to ROCs– UK/I, Fr, It, CERN, + Russia (M12)– Hope to get non-European centres

• Regional Operations Centres (ROC)– Act as front-line support for user and

operations issues– Provide local knowledge and

adaptations– One in each region – many distributed

• User Support Centre (GGUS)– In FZK – support portal – provide

single point of contact (service desk)

ESRF, ILL, EMBL; Grenoble, October 19, 2005 9

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Activities

• 48 % service activities (48 % service activities (Grid Grid Operations, Support and Management, Operations, Support and Management, Network Resource ProvisionNetwork Resource Provision))

• 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development)

• 28 % networking (Management, 28 % networking (Management, Dissemination and Outreach, User Dissemination and Outreach, User Training and Education, Application Training and Education, Application Identification and Support, Policy and Identification and Support, Policy and International Cooperation)International Cooperation)

Emphasis in EGEE is on operating a productiongrid and supporting the end-users

ESRF, ILL, EMBL; Grenoble, October 19, 2005 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Grid middleware

• The Grid relies on advanced software, called middleware, which interfaces between resources and the applications

• The GRID middleware:– Finds convenient places for

the application to be run– Optimises use of resources– Organises efficient access to data – Deals with authentication to the

different sites that are used– Runs the job & monitors progress– Recovers from problems– Transfers the result back to the scientist

should

ESRF, ILL, EMBL; Grenoble, October 19, 2005 11

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite middleware

– The 1st release of gLite (v1.0) made end March’05 With frequent releases since – gLite 1.4 now http://glite.web.cern.ch/glite/packages/R1.0/R20050331 http://glite.web.cern.ch/glite/documentation

– Lightweight services– Interoperability & Co-existence with deployed

infrastructure– Performance & Fault Tolerance– Portable– Service oriented approach– Site autonomy– Open source license

ESRF, ILL, EMBL; Grenoble, October 19, 2005 12

Enabling Grids for E-sciencE

INFSO-RI-508833

• Job management Services– Workload Management– Computing Element– Logging and Bookkeeping

• Data management Services– File and Replica catalog– File Transfer and

Placement Services– gLite I/O

• Information Services– R-GMA– Service Discovery

• Security

• Deployment Modules– Distribution available as RPM’s, Binary Tarballs, Source Tarballs and APT

cache

gLite Release 1.0

Grid AccessService

API

Access Services

JobProvenance

Job Management Services

ComputingElement

WorkloadManagement

PackageManager

MetadataCatalog

Data Services

StorageElement

DataManagement

File & ReplicaCatalog

Authorization

Security Services

Authentication

Auditing

Information &Monitoring

Information & Monitoring Services

Application

Monitoring

Site Proxy

Accounting

JRA3 UK

CERN IT/CZ

Serious testing & certification is under way

ESRF, ILL, EMBL; Grenoble, October 19, 2005 13

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE Activities

• 48 % service activities (48 % service activities (Grid Grid Operations, Support and Management, Operations, Support and Management, Network Resource ProvisionNetwork Resource Provision))

• 24 % middleware re-engineering 24 % middleware re-engineering ((Quality Assurance, Security, Network Quality Assurance, Security, Network Services DevelopmentServices Development))

• 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation)

Emphasis in EGEE is on operating a productiongrid and supporting the end-users

ESRF, ILL, EMBL; Grenoble, October 19, 2005 14

Enabling Grids for E-sciencE

INFSO-RI-508833

User information & support

• More than 140 training events across many countries– >2000 people trained

induction; application developer; advanced; retreats– Material archive online with >200 presentations– GILDA training testbed

• Public and technical websites constantly evolving to expand information available and keep it up to date

• 3 conferences organized~ 300 @ Cork ~ 400 @ Den Haag ~ 450 @ Athens

• Pisa: 4th project conference 24-28 October ’05

ESRF, ILL, EMBL; Grenoble, October 19, 2005 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Deployment of applications

• Pilot applications– High Energy Physics– Biomed applications

http://egee-na4.ct.infn.it/biomed/applications.html

• Generic applications –Deployment under way– Computational Chemistry– Earth science research – EGEODE: first industrial application– Astrophysics

• With interest from – Hydrology– Seismology – Grid search engines – Stock market simulators– Digital video etc.– Industry (provider, user, supplier)

• Many users– broad range of needs– different communities with different background and internal organization

Pilot New

ESRF, ILL, EMBL; Grenoble, October 19, 2005 16

Enabling Grids for E-sciencE

INFSO-RI-508833

From Phase I to II

• From 1st EGEE EU Review in February 2005: – “The reviewers found the overall performance of the project very good.”– “… remarkable achievement to set up this consortium, to realize

appropriate structures to provide the necessary leadership, and to cope with changing requirements.”

• EGEE I– Large scale deployment of EGEE infrastructure to deliver

production level Grid services with selected number of applications

• EGEE II– Natural continuation of the project’s first phase– Emphasis on providing an infrastructure for e-Science

increased support for applications increased multidisciplinary Grid infrastructure more involvement from Industry

– Extending the Grid infrastructure world-wide increased international collaboration

(Asia-Pacific is already a partner!)

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

[email protected]

Grid Operations

ESRF, ILL, EMBL; Grenoble, October 19, 2005 18

Enabling Grids for E-sciencE

INFSO-RI-508833

Grid Operations

• Several aspects– Integration of middleware

Testing, certification, release preparation

– Deployment of middleware to sites– Operations and operational support– Operational security and policy– User support

2 aspects – helpdesk and integration support

ESRF, ILL, EMBL; Grenoble, October 19, 2005 19

Enabling Grids for E-sciencE

INFSO-RI-508833

Release Process (simplified)

C&TC&T

EISEISGISGIS

GDBGDB

ApplicationsApplicationsRCRCBugs/Patches/Task

SavannahBugs/Patches/Task

Savannah

EISEISCICsCICs

Head of Deployment

Head of Deployment

prioritization&

selection

DevelopersDevelopers

ApplicationsApplications

DevelopersDevelopers

11

List for next release(can be empty)

List for next release(can be empty)22

integration&

first testsC&TC&T

33

Internal ReleasesInternal

Releases

44

User Level install of

client toolsEISEIS

55

full deployment on test clusters (6)

functional/stress tests~1 week

C&TC&T

66

assign and update cost

Bugs/Patches/TaskSavannah

Bugs/Patches/TaskSavannah

componentsready at cutoff

InternalClient

Release

InternalClient

Release

77Client

ReleaseClient

ReleaseService ReleaseService Release

Updates ReleaseUpdates Release

Core Service Release

Core Service Release

C&TC&T

ESRF, ILL, EMBL; Grenoble, October 19, 2005 20

Enabling Grids for E-sciencE

INFSO-RI-508833

Deployment process

Release(s)Release(s)

Certificationis run daily

Update User Guides EISEIS

UpdateRelease Notes

GISGIS

ReleaseNotes

InstallationGuides

UserGuides

Re-Certify

CICCIC

Every Month

1111

ReleaseReleaseReleaseReleaseClient ReleaseClient Release

Deploy ClientReleases

(User Space)GISGIS

Deploy ServiceReleases (Optional) CICs

RCsCICsRCs

Deploy MajorReleases

(Mandatory) ROCsRCs

ROCsRCs

YAIM

Every Month

Every 3 months

on fixed dates !

at own pace

ESRF, ILL, EMBL; Grenoble, October 19, 2005 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Grid Operations

• The grid is flat, but

• Hierarchy of responsibility– Essential to scale the operation

• CICs act as a single Operations Centre– Operational oversight (grid

operator) responsibility

– rotates weekly between CICs

– Report problems to ROC/RC

– ROC is responsible for ensuring problem is resolved

– ROC oversees regional RCs

• ROCs responsible for organising the operations in a region– Coordinate deployment of

middleware, etc

• CERN coordinates sites not associated with a ROC

CIC

CICCIC

CICCIC

CICCIC

CICCIC

CICCIC

RCRC

RCRC RCRC

RCRC

RCRC

ROCROC

RCRC

RCRC

RCRCRCRC

RCRCRCRC

ROCROC

RCRC

RCRC RCRC

RCRC

RCRC

ROCROC

RCRC

RCRC

RCRC

RCRC

ROCROC

OMCOMC

RC = Resource Centre

ESRF, ILL, EMBL; Grenoble, October 19, 2005 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Grid monitoring

– GIIS Monitor + Monitor Graphs– Sites Functional Tests– GOC Data Base– Scheduled Downtimes

– Live Job Monitor– GridIce – VO + Fabric View– Certificate Lifetime Monitor

• Operation of Production Service: real-time display of grid operations

• Accounting Information

Such tools help the operations staff to ensure the sites work continuously

ESRF, ILL, EMBL; Grenoble, October 19, 2005 23

Enabling Grids for E-sciencE

INFSO-RI-508833

Operational Security

• Operational Security team in place– EGEE security officer, ROC security contacts– Concentrate on 3 activities:

Incident response Best practice advice for Grid Admins – creating dedicated web Security Service Monitoring evaluation

• Incident Response– JSPG agreement on IR in collaboration with OSG

Update existing policy “To guide the development of common capability for handling and response to cyber security incidents on Grids”

– Basic framework for incident definition and handling

• Site registration process in draft– Part of basic SLA

• CA Operations– EUGridPMA – best practice, minimum standards, etc.– More and more CAs appearing

• Security group and work was started in LCG – was from the start a cross-grid activity.

• Much already in place at start of EGEE: usage policy, registration process and infrastructure, etc.

•We regard it as crucial that this activity remains broader than just EGEE

• Security group and work was started in LCG – was from the start a cross-grid activity.

• Much already in place at start of EGEE: usage policy, registration process and infrastructure, etc.

•We regard it as crucial that this activity remains broader than just EGEE

Now IGTF

ESRF, ILL, EMBL; Grenoble, October 19, 2005 24

Enabling Grids for E-sciencE

INFSO-RI-508833

Policy – Joint Security Group

Security & Availability Policy

UsageRules

Certification Authorities

AuditRequirements

Best practiceGuides

Incident Response

User RegistrationApplication Development& Network Admin Guide

http://cern.ch/proj-lcg-security/documents.html

GGUSGeneral User

Support

GGUSGeneral User

Support

Grid “Operator on duty”

Grid “Operator on duty”

ROCOperations

ROCOperations

ROCOperations

ROCOperations

SiteSite SiteSite…

ROCLocal

User Support

ROCLocal

User Support

2nd level support:- Deployment- Middleware- etc.

2nd level support:- Deployment- Middleware- etc.

VO Support

VO Support

VO Support

VO Support

Reports problem

Reports problem

Re

po

rts pro

ble

m

2nd Level support1st Level support

NA4

SA1

JRA1SA1SA3

VOSupportTPM

I need help! I send e-mail to [email protected] [email protected]

I need help! I send e-mail to [email protected] [email protected]

E-mail automatically converted in GGUS ticket

VO Support ROC Support

Middleware Support

Other Grids Support

Mailing lists

Ticket Process Manager: Monitor ticket assignments.

Direct to correct support unit

VO Support: Receive tickets VO related and follows them.

Solves/forward problems VO specific. Recognize Grid related problems and assign them to specific support units

GGUS Support:GGUS Support:The ModelThe Model

Operations Support

ESRF, ILL, EMBL; Grenoble, October 19, 2005 27

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE infrastructure usage

• Average job duration January 2005 – June 2005 for the main VOs

Infrastructure is continuously used by many groups

ESRF, ILL, EMBL; Grenoble, October 19, 2005 28

Enabling Grids for E-sciencE

INFSO-RI-508833

10,000 jobs /day

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

[email protected]

Grid Middleware

ESRF, ILL, EMBL; Grenoble, October 19, 2005 30

Enabling Grids for E-sciencE

INFSO-RI-508833

Middleware

• Middleware is sourced from– VDT (Globus, Condor, etc)– EDG LCG-x– gLite (EGEE)– Other components developed in LCG

File catalogue, disk pool manager

ESRF, ILL, EMBL; Grenoble, October 19, 2005 31

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Services for Release 1.0Components Summary and Origin

• Computing Element– Gatekeeper, WSS (Globus)– Condor-C (Condor)– CE Monitor (EGEE)– Local batch system (PBS, LSF,

Condor)• Workload Management

– WMS (EDG)– Logging and bookkeeping (EDG)– Condor-C (Condor)

• Storage Element– File Transfer/Placement (EGEE)– glite-I/O (AliEn)– GridFTP (Globus)– SRM: Castor (CERN), dCache

(FNAL, DESY), other SRMs

• Catalog– File and Replica Catalog

(EGEE)– Metadata Catalog (EGEE)

• Information and Monitoring– R-GMA (EDG)– Service Discovery (EGEE)

• Security– VOMS (DataTAG, EDG)– GSI (Globus)– Authentication for C and Java

based (web) services (EDG)

ESRF, ILL, EMBL; Grenoble, October 19, 2005 32

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Grid Middlewareguiding principles

• Service oriented approach– Allow for multiple interoperable implementations

• Lightweight (existing) services – Easily and quickly deployable– Use existing services where possible

Condor, EDG, Globus, LCG, … • Portable

– Being built on Scientific Linux and Windows• Security

– Sites and Applications• Performance/Scalability & Resilience/Fault Tolerance

– Comparable to deployed infrastructure• Co-existence with deployed infrastructure

– Co-existence with LCG-2 and OSG (US) are essential for the EGEE Grid services

• Site autonomy– Reduce dependence on ‘global, central’ services

• Open source license

ESRF, ILL, EMBL; Grenoble, October 19, 2005 33

Enabling Grids for E-sciencE

INFSO-RI-508833

Architecture & Design

• Design team including representatives from Middleware providers (AliEn, Condor, EDG, Globus,…) including US partners produced middleware architecture and design.

• Takes into account input and experiences from applications, operations, and related projects

• DJRA1.1 – EGEE Middleware Architecture (June 2004)– https://edms.cern.ch/document/476451/

• DJRA1.2 – EGEE Middleware Design (August 2004)– https://edms.cern.ch/document/487871/

• Much feedback from within the project (operation & applications) and from related projects– Being used and actively discussed by OSG, GridLab, etc. Input to

various GGF groups

ESRF, ILL, EMBL; Grenoble, October 19, 2005 34

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Services for Release 1

Grid AccessService

API

Access Services

JobProvenance

Job Management Services

ComputingElement

WorkloadManagement

PackageManager

MetadataCatalog

Data Services

StorageElement

DataManagement

File & ReplicaCatalog

Authorization

Security Services

AuthenticationAuditing

Information &Monitoring

Information & Monitoring Services

Application

Monitoring

Site Proxy

Accounting

JRA3 UK

CERN IT/CZ

ESRF, ILL, EMBL; Grenoble, October 19, 2005 35

Enabling Grids for E-sciencE

INFSO-RI-508833

Job Management Services

Efficient and reliable scheduling of computational tasks on the available infrastructure

• Started with LCG-2 Workload Management System (WMS)– Inherited from EDG– Support partitioned jobs and jobs with dependencies – Support for different replica catalogs for data based scheduling– Modification of internal structure of WMS

Task queue: queue of pending submission requests Information supermarket: repository of information on resources Better reliability, better performance, better interoperability, support push

and pull mode

– Under development Web Services interface supporting bulk submission (after V1.0)

• Bulk submission supported now by use of DAGs

ESRF, ILL, EMBL; Grenoble, October 19, 2005 36

Enabling Grids for E-sciencE

INFSO-RI-508833

WMS Interaction Overview

ESRF, ILL, EMBL; Grenoble, October 19, 2005 37

Enabling Grids for E-sciencE

INFSO-RI-508833

• Efficient and reliable data storage, movement, and retrieval on the infrastructure

• Storage Element– Reliable file storage (SRM based storage systems)– Posix-like file access (gLite I/O)– Transfer (gridFTP)

• File and Replica Catalog– Resolves logical filenames (LFN) to physical location of files (URL understood

by SRM) and storage elements– Hierarchical File system like view in LFN space– Single catalog or distributed catalog (under development) deployment

possibilities• File Transfer and Placement Service

– Reliable file transfer and transactional interactions with catalogs• Data Scheduler

– Scheduled data transfer in the same spirit as jobs are being scheduled taking into account e.g. network characteristics (collaboration with JRA4)

– Under development• Metadata Catalog

– Limited metadata can be attached to the File and Replica Catalog– Interface to application specific catalogs have been defined

Data Management Services

ESRF, ILL, EMBL; Grenoble, October 19, 2005 38

Enabling Grids for E-sciencE

INFSO-RI-508833

DM Interaction Overview

File andReplica Catalog

StorageIndex

Fireman

Database

WMS

Storage Element

SRM

Storage

gLite I/O gridFTP

File Transfer andPlacement Service FTS

FPS Transfer Agent

Database

VOMS

MyProxy

Getcredential

Storecredential

File I/O

File namespace

and Metadata mgmt

File replication

Proxy renewal ReplicaLocation

ESRF, ILL, EMBL; Grenoble, October 19, 2005 39

Enabling Grids for E-sciencE

INFSO-RI-508833

Information and Monitoring Services

• R-GMA (Relational Grid Monitoring Architecture)– Implements GGF GMA standard– Development started in EDG, deployed on the production infrastructure

for accounting and monitoring

ProducerService

RegistryService

ConsumerService

AP

IA

PI

Mediator

SchemaService

Consumerapplication

Producerapplication

Publish Tuples

Send Query

Receive Tuples

Register

LocateQu

ery

Tu

ples

SQL “CREATE TABLE”

SQL “INSERT”

SQL “SELECT”

ESRF, ILL, EMBL; Grenoble, October 19, 2005 40

Enabling Grids for E-sciencE

INFSO-RI-508833

R-GMA

• Producer, Consumer, Registry and Schema services with supporting tools– Registry replication– Simpler API – matching the next (WS) release

Provides smooth transition between old API and WS

– coping with life on the Grid: poorly configured networks, firewalls, MySQL corruptions etc

• Generic Service Discovery API– Defined but not yet implemented by any gLite services

• Under development– Web Service version– File (as well as memory and RDBMS) based Producers– Native python interface – Fine grained authorization– Schema replication

ESRF, ILL, EMBL; Grenoble, October 19, 2005 41

Enabling Grids for E-sciencE

INFSO-RI-508833

Other Reengineering Activities

• Prototypes of Grid Access Service and Package Manager implemented in the AliEn framework

• Grid Access Service– Acts on user’s behalf– Discovers and manages Grid services for the user

• Package Manager– Provides dynamically distribution of application software needed– Does not install Grid middleware

ESRF, ILL, EMBL; Grenoble, October 19, 2005 42

Enabling Grids for E-sciencE

INFSO-RI-508833

Security

• Job Management Services– Authorization based on VOMS VO, groups, and user information

• Data Services– Authorization: ACL and Unix permissions– Fine-grained ACL on data enforced through gLite-IO and

Catalogs– Catalog data itself is authorized through ACLs

Currently supported through DNs VOMS integration being developed

• Information Services– Fine grained authorization based on VOMS certificates being

implemented

ESRF, ILL, EMBL; Grenoble, October 19, 2005 43

Enabling Grids for E-sciencE

INFSO-RI-508833

Main Differences to LCG-2

• Workload Management System works in push and pull mode

• Computing Element moving towards a VO based scheduler guarding the jobs of the VO (reduces load on GRAM)

• Re-factored file & replica catalogs

• Secure catalogs (based on user DN; VOMS certificates being integrated)

• Scheduled data transfers

• SRM based storage

• Information Services: R-GMA with improved API, Service

• Discovery and registry replication

• Move towards Web Services

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

[email protected]

Joining the grid …

New sitesNew applications

ESRF, ILL, EMBL; Grenoble, October 19, 2005 45

Enabling Grids for E-sciencE

INFSO-RI-508833

New sites

• Join the EGEE infrastructure– Install the middleware– Become certified (check that installation is OK)– Set your policies

Which applications you accept – might be only your local ones You still gain operational monitoring, possibility to use other

resources, direct collaboration with other institutes

• Set up a clone of the EGEE infrastructure– Never been done– Potentially needed for (e.g.) medical applications– Requires investment in all the operational structure– Would re-use only the software … which is rapidly evolving

ESRF, ILL, EMBL; Grenoble, October 19, 2005 46

Enabling Grids for E-sciencE

INFSO-RI-508833

New applications

• Pilot applications– HEP and Biomed in EGEE-I– Several more in EGEE-II

• Reviewed applications– Go through a process to ascertain that science will benefit from

the use of the grid– Expect a certain level of direct support; but also expected to

have a community support behind

• Other applications– Actually quite a few – HEP, Astrophysics, …– More or less self-supporting in terms of getting started

• All benefit from operational and user support– Pilot and reviewed apps get more integration support

ESRF, ILL, EMBL; Grenoble, October 19, 2005 47

Enabling Grids for E-sciencE

INFSO-RI-508833

New applications

• Training courses– General courses, or potentially tailored to need

• GILDA demonstration system• Pre-production service• Production service

ESRF, ILL, EMBL; Grenoble, October 19, 2005 48

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE – What can it deliver?

• A managed operation – providing a service:– A large number of sites of different sizes and capabilities– Developed operational procedures

Monitoring of the grid services providing access to resources

– Support services: user support, training, etc. – Building up considerable experience in grid-enabling a variety of

different applications– Tools for monitoring of resources at a site … if required

• A new VO joining EGEE with a few sites:– Benefits from the operations and support – the VO sites can be

monitored and supported as part of the infrastructure– Potentially access to other resources – It is a significant effort to set up a grid infrastructure from scratch

ESRF, ILL, EMBL; Grenoble, October 19, 2005 49

Enabling Grids for E-sciencE

INFSO-RI-508833

… and what does it cost?

• “The application VO buys into the EGEE model”– Actually not so restrictive now – supports many linux flavours,

IA64, (other teams have worked on AIX, SGI ports)– Simple installation of client software now (can be done on the fly)– Basic grid services are quite general, nothing really application-

specific

• Some unresolved issues:– Commercial licensed software used by an application– Levels of privacy/security needed in some life-science

applications– True interactivity

• … and of course, this is all new, rapidly evolving and many problems still to be overcome

ESRF, ILL, EMBL; Grenoble, October 19, 2005 50

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• Grids are a powerful new tool for science

• Several applications are already benefiting from Grid technologies (biomedical is a good example, … and High Energy Physics of course)

• Europe is strong in the development of Grids also thanks to the success of EGEE and related projects

• EGEE offers:– A mechanism for linking together the people, resources and data of your

scientific community– Continuous monitoring of the status of your Virtual Organisation– A set of middleware for gridifying applications with documentation, training and

support– Regular forums for linking with grid experts, other communities and industry

• EGEE-II will further extend support for user communities and applications

ESRF, ILL, EMBL; Grenoble, October 19, 2005 51

Enabling Grids for E-sciencE

INFSO-RI-508833

Contacts

• EGEE Websitehttp://www.eu-egee.org

• How to joinhttp://public.eu-egee.org/join/

• EGEE Project [email protected]

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

[email protected]

Examples of applications

ESRF, ILL, EMBL; Grenoble, October 19, 2005 53

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE pilot applications (I)

• High-Energy Physics (HEP)– Provides computing infrastructure (LCG)

for experiments at CERN in Geneva

– Challenging: thousands of processors world-wide generating petabytes of data ‘chaotic’ use of grid with individual user

analysis (thousands of users interactively operating within experiment VOs)

• Biomedical Applications Similar computing and

data storage requirements Major additional challenge:

security & access to data in many formats

Mont Blanc(4810 m)

Downtown Geneva

ESRF, ILL, EMBL; Grenoble, October 19, 2005 54

Enabling Grids for E-sciencE

INFSO-RI-508833

BioMed Overview

• Infrastructure– ~2000 CPUs– ~21 TB of disk– in 12 countries

• >50 users in 7 countries working with 12 applications

• 18 research labs

• ~80.000 jobs launched since 04/2004

• ~10 CPU yearsMonth

Num

ber

of

job

s

PADOVA

BARI

15 resource centres 17 CEs 16 SEs

ESRF, ILL, EMBL; Grenoble, October 19, 2005 55

Enabling Grids for E-sciencE

INFSO-RI-508833

Bioinformatics

• GPS@: Grid Protein Sequence Analysis– Gridified version of NPSA web portal

Offering proteins databases and sequence analysis algorithms to the bioinformaticians (3000 hits per day)

Need for large databases and big number of short jobs

– Objective: increased computing power– Status: 9 bioinformatic softwares gridified– Grid added value: open to a wider community with larger

bioinformatic computations

• xmipp_MLrefine– 3D structure analysis of macromolecules

From (very noisy) electron microscopy images Maximum likelihood approach to find the optimal model

– Objective: study molecule interaction and chem. properties– Status: algorithm being optimised and ported to 3D– Grid added value: parallel computation on different resources of

independent jobs

ESRF, ILL, EMBL; Grenoble, October 19, 2005 56

Enabling Grids for E-sciencE

INFSO-RI-508833

Medical imaging

• GATE– Radiotherapy planning

Improvement of precision by Monte Carlo simulation Processing of DICOM medical images

– Objective: very short computation time compatible with clinical practice

– Status: development and performance testing– Grid Added Value: parallelisation reduces computing time

• CDSS– Clinical Decision Support System

Assembling knowledge databases Using image classification engines

– Objective: access to knowledge databases from hospitals – Status: from development to deployment, some medical end

users – Grid Added Value: ubiquitous, managed access to distributed

databases and engines

ESRF, ILL, EMBL; Grenoble, October 19, 2005 57

Enabling Grids for E-sciencE

INFSO-RI-508833

Medical imaging

• SiMRI3D– 3D Magnetic Resonance Image Simulator

MRI physics simulation, parallel implementation Very compute intensive

– Objective: offering an image simulator service to the research community

– Status: parallelised and now running on EGEE resources– Grid Added Value: enables simulation of high-res images

• gPTM3D– Interactive tool to segment and analyse medical images

A non gridified version is distributed in several hospitals Need for very fast scheduling of interactive tasks

– Objectives: shorten computation time using the grid Interactive reconstruction time: < 2min and scalable

– Status: development of the gridified version being finalized– Grid Added Value: permanent availability of resources

ESRF, ILL, EMBL; Grenoble, October 19, 2005 58

Enabling Grids for E-sciencE

INFSO-RI-508833

Drug Discovery

• Grid-enabled drug discovery process for neglected diseases– In silico docking: compute probability that potential drugs will dock with a target

protein– To speed up and reduce cost required to develop new drugs

• WISDOM (Wide In Silico Docking On Malaria)– Drug Discovery Data Challenge– 11 July – 19 August– 46 million docked ligands produced

(typical for computer clusters: 100 000 ligands)– Equivalent to 80 CPU years – 1000 computers in 15 countries used simultaneously– Millions of files (adding up to a few TB of data)

Never done on a large scale production infrastructure Never done for a neglected disease

• Next steps – Sort through data to identify potential drugs– Develop the next steps of the process (molecular dynamics)

ESRF, ILL, EMBL; Grenoble, October 19, 2005 59

Enabling Grids for E-sciencE

INFSO-RI-508833

Generic Applications

• EGEE Generic Applications Advisory Panel (EGAAP)– UNIQUE entry point for “external” applications

– Reviews proposals and make recommendations to EGEE management Deals with “scientific” aspects, not with technical details Generic Applications group in charge of introducing selected applications to

the EGEE infrastructure

– 6 applications selected so far: Earth sciences (earth observation, geophysics, hydrology, seismology) MAGIC (astrophysics) Computational Chemistry PLANCK (astrophysics and cosmology) Drug Discovery E-GRID (e-finance and e-business) GRACE (grid search engine, ended Feb 2005)

ESRF, ILL, EMBL; Grenoble, October 19, 2005 60

Enabling Grids for E-sciencE

INFSO-RI-508833

Earth sciences applications

• Earth Observations by Satellite – Ozone profiles

• Solid Earth Physics – Fast Determination of mechanisms

of important earthquakes

• Hydrology – Management of water resources

in Mediterranean area (SWIMED)

• Geology– Geocluster: R&D initiative of the

Compagnie Générale de Géophysique

A large variety of applications ported on EGEE which incites new users

Interactive Collaboration of the teams around a project

ESRF, ILL, EMBL; Grenoble, October 19, 2005 61

Enabling Grids for E-sciencE

INFSO-RI-508833

MAGIC

• Ground based Air Cerenkov Telescope 17 m diameter

• Physics Goals: – Origin of VHE Gamma rays– Active Galactic Nuclei– Supernova Remnants– Unidentified EGRET sources– Gamma Ray Burst

• MAGIC II will come 2007• Grid added value

– Enable “(e-)scientific” collaboration between partners – Enable the cooperation between different experiments– Enable the participation on Virtual Observatories

ESRF, ILL, EMBL; Grenoble, October 19, 2005 62

Enabling Grids for E-sciencE

INFSO-RI-508833

Computational Chemistry

• The Grid Enabled Molecular Simulator (GEMS)– Motivation:

Modern computer simulations of biomolecular systems produce an abundance of data, which could be reused several times by different researchers. data must be catalogued and searchable

– GEMS database and toolkit: autonomous storage resources metadata specification automatic storage allocation and

replication policies interface for distributed computation

ESRF, ILL, EMBL; Grenoble, October 19, 2005 63

Enabling Grids for E-sciencE

INFSO-RI-508833

Planck

• On the Grid:> 12 times faster

(with ~5% failures)

• Complex data structure data handling

important

• The Grid as– collaboration

tool

– common user-interface

– flexible environment

– new approach to data and S/W sharing