academic cloud experiences cern v4

31
Clouds at CERN Tim Bell [email protected] Clouds at CERN Tim Bell [email protected] Academic Cloud Experiences, 29 th April 2013 Academic Cloud Experiences, 29 th April 2013 T. Bell 1

Upload: tim-bell

Post on 25-May-2015

434 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Academic cloud experiences cern v4

Clouds at CERNTim Bell

[email protected]

Clouds at CERNTim Bell

[email protected]

Academic Cloud Experiences, 29th April 2013Academic Cloud Experiences, 29th April 2013T. Bell 1

Page 2: Academic cloud experiences cern v4

2

CERN was founded 1954: 12 European States“Science for Peace”

Today: 20 Member States

Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO

Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO

~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF

~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF

T. Bell 2

Page 3: Academic cloud experiences cern v4

T. Bell 3

Is the Higgs boson the source of mass of our fundamental particles?

Page 4: Academic cloud experiences cern v4

T. Bell 4

Why is the universe made of matter

and not equal amounts of matter/antimatter?

Page 5: Academic cloud experiences cern v4

T. Bell 5

Dark Matter and Dark Energy?

TTWe do not know the

composition of 95% of the universe

Temperature of the universeWMAP satellite

Page 6: Academic cloud experiences cern v4

T. Bell 6

Blue tubes contain the two beam pipes and magnets at 1.8 degrees Kelvin

Page 7: Academic cloud experiences cern v4

T. Bell 7

ATLAS detector during construction in 2005

Page 8: Academic cloud experiences cern v4

T. Bell 8

Number of candidates (vertical axis)

Mass of the candidates(horizontal axis)

We observe an excess of candidates with a mass of 125 proton-

masses

Search for Higgs decays to 4 “leptons” (electrons or muons)

Also observed in the CMS experiment

Page 9: Academic cloud experiences cern v4

T. Bell 9

July 4, 2012

Page 10: Academic cloud experiences cern v4

The Worldwide LHC Computing Grid

Tier-1: permanent storage, re-processing, analysis

Tier-1: permanent storage, re-processing, analysis

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-2: Simulation,end-user analysisTier-2: Simulation,end-user analysis

> 2 million jobs/day> 2 million jobs/day

~250’000 cores~250’000 cores

173 PB of storage173 PB of storage

nearly 160 sites, 35 countries

nearly 160 sites, 35 countries

10 Gb links10 Gb links

Tier-1: permanent storage, re-processing, analysis

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-2: Simulation,end-user analysis

> 2 million jobs/day

~250’000 cores

173 PB of storage

nearly 160 sites, 35 countries

10 Gb links

WLCG:An International collaboration to distribute and analyse LHC data

Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists

WLCG:An International collaboration to distribute and analyse LHC data

Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicistsT. Bell 10

Page 11: Academic cloud experiences cern v4

IT Infrastructure Challenges

Staff numbers fixed Materials budget decreasing Increasing users of CERN’s facilities Legacy tools are high maintenance and brittle Additional data centre in Budapest now online

doubling potential capacity and 200GBit/s network

How do we scale from our current 11,000 servers within these constraints ?

T. Bell 11

Page 12: Academic cloud experiences cern v4

Approach

Remodel IT services on Cloud layered models IaaS, PaaS, SaaS

Move to commonly used open source tools Puppet,OpenStack,Foreman,Koji,Oz,Kibana, …

Implement clouds at scale IT aims for 15,000 hypervisors with 150,000 VMs

by 2015 Exploit ecosystem solutions such as LBaaS,

DBaaS, MQaaS rather than build our own

T. Bell 12

Page 13: Academic cloud experiences cern v4

Clouds in High Energy Physics

T. Bell 13

Long-term preservation of software and data of

HEP experiments

Utilize special computing resources

attached to the detectors

Simplify the management of heterogeneous in-

house resources

Use commercial clouds for exceptional

computing demands

Distributed cloud computing using HEP and non-HEP clouds

Page 14: Academic cloud experiences cern v4

Service Models

T. Bell 14

Pets are given names like pussinboots.cern.ch

They are unique, lovingly hand raised and cared for

When they get ill, you nurse them back to health

Cattle are given numbers like vm0042.cern.ch

They are almost identical to other cattle When they get ill, you get another one

Future application architectures tend towards Cattle but Pet support is needed for some specific zones of the cloud

Page 15: Academic cloud experiences cern v4

Refine Service Levels ?

T. Bell 15

Hippos are cattle with bulk storage. Useful where Cassandra or MongoDBensures redundancy

Canaries are cattle at high risk to give early warning of failures .. Deploy early, fail fast and fix

Page 16: Academic cloud experiences cern v4

Infrastructure Overview

T. Bell 16

Microsoft Active Directory

CERN DB on Demand

CERN Network Database

Account mgmt. system

Horizon

Keystone

NetworkCompute

Glance

Scheduler

Cinder

Nova

CERN Block Storage provider

Page 17: Academic cloud experiences cern v4

Dashboard using Horizon

T. Bell 17

Page 18: Academic cloud experiences cern v4

Timelines

Deploy as stable release becomes available in EPEL

Keep up to date but not too close Benefit from continuous integration testing of

other companies

T. Bell 18

Grizzly

' 12 Jan2013 Feb Apr May … Oct Dec ' 13

Today HavanaOct, 2013

Havana ServiceNov/Dec, 2013

Apr 4, 2013

Grizzly ServiceMay, 2013

IbexFeb, 2013

FolsomSep 27, 2012

Page 19: Academic cloud experiences cern v4

Status CERN IT OpenStack Cloud

Running Folsom around 500 hypervisors on KVM and Hyper-V

High availability using load balancing 75 users creating around 50 new VMs/day

Experiment farms CMS currently running 1,300 hypervisors with

50,000 cores using Essex ATLAS starting to ramp up to a similar size

Other HEP sites moving to private cloud Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP,

…T. Bell 19

Page 20: Academic cloud experiences cern v4

Next Steps (I)

Move to Grizzly Target end May 2013

Enable Kerberos and X.509 authentication Avoids users having to enter passwords

Recycle existing hardware and scale using cells Can recycle around 100 batch machines to

hypervisors/week

T. Bell 20

Page 21: Academic cloud experiences cern v4

Cells

T. Bell 21

Page 22: Academic cloud experiences cern v4

We’re not alone …

T. Bell 22

Already 6 sites running more than 10,000 hypervisors according to the latest OpenStack user survey

Page 23: Academic cloud experiences cern v4

Next Steps (II) Block Storage for Hippos and Pets

Cinder with Ceph, NetApp or GlusterFS Heat for Orchestration and auto-scaling Load Balancing as a Service Bare-Metal to bring all servers under

OpenStack Move ceilometer into production

Accounting by project Move to wall-clock, vCPU metering

T. Bell 23

Page 24: Academic cloud experiences cern v4

Cost Model CERN computing is funded from CERN central

budgets, no billing but quotas

T. Bell 24

IT resource manager

Experiment resource managers

Project Management

Page 25: Academic cloud experiences cern v4

Quota Management

What to do when quota is exceeded ? No credit card

If capacity is not used ? Spot market on low SLA conditions

Fair share across the cloud ? Worked for supercomputers but heavy for clouds

at scale Bursting to public clouds an option ?

IT provisioned or experiment decision

T. Bell 25

Page 26: Academic cloud experiences cern v4

Cloud of clouds: the next big step What is required to get to a cloud of clouds ?

Federated identity Image conversion and sharing API standardisation SLAs Security models

Many initiatives investigating this at different levels Public/Private bursting Private/Private sharing (as the grid) Homogeneous and Heterogeneous

We will see intensive efforts in this area over the coming year

T. Bell 26

Page 27: Academic cloud experiences cern v4

Conclusions

Clouds provide a framework for re-engineering how IT is delivering responsive services to the physicists

OpenStack and the ecosystem provide a suitable solution with flexibility and opportunity to contribute as well as benefit from work of others

Migration via re-cycling bare-metal to hypervisors provides a smooth transition

Cloud of clouds has potential to replace grid computing models in the future

T. Bell 27

Page 28: Academic cloud experiences cern v4

Questions?Questions?

T. Bell 28

Page 29: Academic cloud experiences cern v4

BACKUP SLIDES

Page 30: Academic cloud experiences cern v4

Job Opportunities

T. Bell 30

Page 31: Academic cloud experiences cern v4

Science is getting more and more global

CERN: x staff, x fellows

T. Bell 31