hp puerto-rico – 9 february 2004 - 1 cern and the lhc computing grid ian bird it department cern,...

27
HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 [email protected]

Post on 18-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 1

CERN and the LHC Computing Grid

Ian BirdIT Department

CERN, Geneva, Switzerland

HP Puerto Rico9 February 2004

[email protected]

Page 2: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 2

• CERN is the world's largest particle physics centre funded by 20 European member states

• Particle physics is about:

- elementary particles of which all matter in the universe is made

- fundamental forces which hold matter together

• Particles physics requires:

- special tools to create and study new particles

What is CERN? CERN is:- 2500 staff scientists (physicists, engineers, …)- Some 6500 visiting scientists (half of the world's particle physicists)

They come from 500 universities representing

80 nationalities.

Page 3: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 3

Mont Blanc, 4810 m

Downtown Geneva

… is located in Geneva, Switzerland

Page 4: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 4

What is CERN?

The special tools for particle physics are:

• ACCELERATORSHuge machines able to speed up particles to very high energies before colliding them into other particles

• DETECTORSMassive instruments which register the particles produced when the accelerated particles collide

Page 5: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 5

• LHC will collide beams of protons at an energy of 14 TeV

• Using the latest super-conducting technologies, it will operate at about – 2700C, just above absolute zero of temperature.

• With its 27 km circumference, the accelerator will be the largest superconducting installation in the world.

What is LHC? LHC is due to

switch on in 2007

Four experiments, with detectors as ‘big as cathedrals’:

ALICEATLASCMSLHCb

Page 6: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 6

• A particle collision = an event

• Provides trivial parallelism, hence usage of simple farms

• Physicist's goal is to count, trace and characterize all the particles produced and fully reconstruct the process.

• Among all tracks, the presence of “special shapes” is the sign for the occurrence of interesting interactions.

The LHC Data Challenge

Page 7: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 7

Starting from this event…

You are looking for this “signature”

Selectivity: 1 in 1013

Like looking for 1 person in a thousand world populations!

Or for a needle in 20 million haystacks!

The LHC Data Challenge

Page 8: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 8

LHC data (simplified)

• 40 million collisions per second

• After filtering, 100 collisions of interest per second

• A Megabyte of digitised information for each collision = recording rate of 0.1 Gigabytes/sec

• 1011 collisions recorded each year = 10 Petabytes/year of data

CMS LHCb ATLAS ALICE

1 Megabyte (1MB)A digital photo

1 Gigabyte (1GB) = 1000MBA DVD movie

1 Terabyte (1TB)= 1000GBWorld annual book production

1 Petabyte (1PB)= 1000TB10% of the annual production by LHC experiments

1 Exabyte (1EB)= 1000 PBWorld annual information production

Page 9: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 9

Expected LHC computing needs

Estimated DISK Capacity at CERN

0

1000

2000

3000

4000

5000

6000

7000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

year

Tera

Byt

es

Estimated Mass Storage at CERN

LHC

Other experiments

0

20

40

60

80

100

120

140

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Year

Pet

aByt

es

Estimated CPU Capacity at CERN

0

1,000

2,000

3,000

4,000

5,000

6,000

1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

year

K S

I95 Moore’s law (based

on 2000 data)

Networking:10 – 40 Gb/s to all big centres

today

Data: ~15 Petabytes a yearProcessing: ~ 100,000 of today’s PC’s

Page 10: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 10

Computing at CERN today

• High-throughput computing based on reliable “commodity” technology• More than 1500 dual processor PCs • More than 3 Petabyte of data on disk (10%) and tapes (90%)

Nowhere near enough!

Page 11: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 11

The new computer room is being populated…

CPU servers

Disk servers

Tape silos and servers

Computing at CERN today

Page 12: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 12

CPU servers

Disk servers

…while the existing computer centre is being cleared for renovation…

Computing at CERN today

…and an upgrade of the power supply from 0.5MW to 2.5MW is underway.

Page 13: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 13

Computing for LHC

• Problem: even with computer centre upgrade, CERN can only provide a fraction of the necessary resources

• Solution: computing centres, which were isolated in the

past, will now be connected, uniting the computing resources of particle physicists in the world using GRID technologies!  

Europe: ~270 institutes~4500 users

Elsewhere: ~200 institutes~1600 users

Page 14: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 14

LHC Computing Grid Project

• The LCG Project is a collaboration of –• The LHC experiments• The Regional Computing Centres• Physics institutes

.. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data

• This includes support for applications• provision of common tools, frameworks, environment, data persistency

• .. and the development and operation of a computing service • exploiting the resources available to LHC experiments in computing

centres, physics institutes and universities around the world• presenting this as a reliable, coherent environment for the experiments

Page 15: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 15

Applications AreaTorre Wenaus

Development environmentJoint projects, Data management

Distributed analysis

Middleware AreaFrédéric Hemmer

Provision of a base set of grid middleware (acquisition, development, integration)

Testing, maintenance, support

CERN Fabric AreaBernd Panzer

Large cluster managementData recording, Cluster technology

Networking, Computing service at CERN

Grid Deployment AreaIan Bird

Establishing and managing the Grid Service - Middleware, certification, security

operations, registration, authorisation,accounting

LCG Project

Technology Office - David FosterOverall coherence of the project; Pro-active technology watch

Long-term grid technology strategy; Computing models

Page 16: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 16

Project Management Board

Project ManagementManagement TeamSC2, GDB chairs

Experiment Delegates

External ProjectsEDG, GridPP, INFN Grid,

VDT, Trillium

Other Resource SuppliersIN2P3, Germany, CERN-IT

Architects’ ForumApplications Area Manager

Experiment ArchitectsComputing Coordinators

Grid Deployment BoardExperiment delegates, national

regional centre delegates

PEB deals directly with the Fabric and

Middleware areas

The GDB negotiates andAgrees operational and security policy,Resource allocation, etc

Page 17: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 17

LCG-1 components (schematic)

Computing cluster Network resources Data storage

Operating system Local schedulerFile system

User access Security Data transfer Information schema

Global scheduler Data managementInformation system

User interfaces Applications

Hardware

System software

“Passive” services

“Active” services

High level services

Closed system (?)Closed system (?) HPSS, CASTOR…HPSS, CASTOR…

RedHat LinuxRedHat Linux NFS, …NFS, … PBS, Condor, LSF,…PBS, Condor, LSF,…

VDT (Globus, GLUE)VDT (Globus, GLUE)

EU DataGridEU DataGrid

LCG, experimentsLCG, experiments

Page 18: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 18

Elements of a Production Grid Service

• Middleware: - the systems software that interconnects the computing clusters at regional centres to provide the illusion of a single computing facility

• Information publishing and finding, distributed data catalogue, data management tools, work scheduler, performance monitors, etc.

• Operations:• Grid infrastructure services• Registration, accounting, security• Regional centre and network operations• Grid operations centre(s) – trouble and performance monitoring, problem

resolution – 24x7 around the world

• Support:• Middleware and systems support for computing centres• Applications integration, production• User support – call centres/helpdesk – global coverage; documentation;

training

Page 19: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 19

• Certification and distribution process established • Middleware package – components from –

• European DataGrid (EDG)• US (Globus, Condor, PPDG, GriPhyN) the Virtual Data

Toolkit• Agreement reached on principles for registration and security• Rutherford Lab (UK) to provide the initial Grid Operations Centre• FZK (Karlsruhe) to operate the Call Centre

LCG Service

The 1st “certified” release was made available to 14 centres on 1 September – Academia Sinica Taipei, BNL, CERN, CNAF, Cyfronet Cracow, FNAL, FZK, IN2P3 Lyon, KFKI Budapest, Moscow State Univ., Prague, PIC Barcelona, RAL, Univ. Tokyo

Page 20: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 20

LCG Service – Next Steps

• Deployment status –• 12 sites active when service opened on 15 September• ~30 sites now active• Pakistan, China, Korea, HP, ..preparing to join

• Preparing now for adding new functionality in November to be ready for 2004

• VO management system• Integration of mass storage systems

• Experiments now starting their tests on LCG-1• CMS target is to have 80% of their production on the grid before

the end of the PCP of DC04• Essential that experiments use all features (including/especially

data management)• -- and exercise the grid model even if not needed for short term

challenges• Capacity will follow readiness of experiments

Page 21: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 21

LCG Service – Next Steps

• Deployment status –• 12 sites active when service opened on 15 September• 28 sites now active• HP, Pakistan, Australia, Korea, China, ..preparing to join

• Starting to deploy LCG-2 – upgrade for 2004• VO management system• Integration of mass storage systems

• Experiments now starting their tests on LCG-2 in preparation for Data Challenges

• CMS target is to have 80% of their production on the grid before the end of the PCP of DC04

• Essential that experiments use all features (including/especially data management)

• -- and exercise the grid model even if not needed for short term challenges

• Capacity will follow readiness of experiments

Page 22: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 22

Resources committed for 1Q04

Resources in Regional Centres

• Resources planned for the period of the data challenges in 2004

• CERN ~12% of the total capacity

• Numbers have to be refined – different standards used by different countries

• Efficiency of use is a major question mark

• Reliability• Efficient scheduling• Sharing between Virtual

Organisations (user groups)

  CPU (kSI2K)

Disk TB

Support FTE

Tape TB

CERN 700 160 10.0 1000

Czech Repub

60 5 2.5 5

France 420 81 10.2 540

Germany 207 40 9.0 62

Holland 124 3 4.0 12

Italy 507 60 16.0 100

Japan 220 45 5.0 100

Poland 86 9 5.0 28

Russia 120 30 10.0 40

Taiwan 220 30 4.0 120

Spain 150 30 4.0 100

Sweden 179 40 2.0 40

Switzerland 26 5 2.0 40

UK 1656 226 17.3 295

USA 801 176 15.5 1741

Total 5600 1169 120.0 4223

Page 23: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 23

LCG Service Time-line

open LCG-1 (schedule – 1 July)

used for simulated event productions

agree spec. of initial service (LCG-1) 2003

2004

2005

2006

2007

first data

physicscomputing service

Level 1 Milestone – Opening of LCG-1 service• 2 month delay, lower functionality than planned• use by experiments will only starting now (planned for end August) decision on final set of middleware for the 1H04 data challenges will be taken without experience of production running

reduced time for integrating and testing the service with experiments’ systems before data challenges start next spring additional functionality will have to be integrated later

open LCG-1 (achieved) – 1 Sept

Page 24: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 24

LCG Service Time-line

used for simulated event productions

agree spec. of initial service (LCG-1) 2003

2004

2005

2006

2007

first data

physicscomputing service

open LCG-1 (achieved) – 1 Sept

* TDR – technical design report

LCG-2 - upgraded middleware, mgt. and ops tools principal service for

LHC data challenges

Computing model TDRs*LCG-3 – second generation

middleware validation of computing models

TDR for the Phase 2 grid

Phase 2 service acquisition, installation, commissioning

experiment setup & preparation

Phase 2 service in production

Page 25: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 25

LCG and EGEE

• EU project approved to provide partial funding for operation of a general e-Science grid in Europe, including the supply of suitable middleware – Enabling Grids for e-Science in Europe – EGEEEGEE provides funding for 70 partners, large majority of which have strong HEP ties

• Similar funding being sought in the US

• LCG and EGEE work closely together, sharing the management and responsibility for -

• Middleware – share out the work to implement the recommendations of HEPCAL II and ARDA

• Infrastructure operation – LCG will be the core from which the EGEE grid develops – ensures compatibility; provides useful funding at many Tier 1, Tier2 and Tier 3 centres

• Deployment of HEP applications - small amount of funding provided for testing and integration with LHC experiments

Page 26: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 26

Middleware - Next 15 months

• Work closely with experiments on developing experience with early distributed analysis models using the grid

• Multi-tier model • Data management, localisation, migration• Resource matching & scheduling• Performance, scalability

• Evolutionary introduction of new software – rapid testing and integration into mainline services – – while maintaining a stable service for data challenges!

• Establish a realistic assessment of the grid functionality that we will be able to depend on at LHC startup – a fundamental input for the Computing Model TDRs due at end 2004

Page 27: HP Puerto-Rico – 9 February 2004 - 1 CERN and the LHC Computing Grid Ian Bird IT Department CERN, Geneva, Switzerland HP Puerto Rico 9 February 2004 Ian.Bird@cern.ch

HP Puerto-Rico – 9 February 2004 - 27

Grids - Maturity is some way off

• Research still needs to be done in all key areas of importance to LHC• e.g. data management, resource matching/provisioning, security, etc.

• Our life would be easier if standards were agreed and solid implementations were available – but they are not

• We are just entering now in the second phase of development• Everyone agrees on the overall direction, based on Web services• But these are not simple developments • And we still are learning how to best approach many of the problems of a grid• There will be multiple and competing implementations – some for sound

technical reasons• We must try to follow these developments and influence the

standardisation activities of the Global Grid Forum (GGF)• It has become clear that LCG will have to live in a world of multiple grids –

but there is no agreement on how grids should inter-operate• Common protocols?• Federations of grids inter-connected by gateways?• Regional Centres connecting to multiple grids?

Running a service in this environment will be challenge!