academic cloud experiences cern v4

Post on 25-May-2015

434 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Clouds at CERNTim Bell

tim.bell@cern.ch

Clouds at CERNTim Bell

tim.bell@cern.ch

Academic Cloud Experiences, 29th April 2013Academic Cloud Experiences, 29th April 2013T. Bell 1

2

CERN was founded 1954: 12 European States“Science for Peace”

Today: 20 Member States

Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO

Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United Kingdom Candidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, Ukraine Observers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO

~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF

~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF

T. Bell 2

T. Bell 3

Is the Higgs boson the source of mass of our fundamental particles?

T. Bell 4

Why is the universe made of matter

and not equal amounts of matter/antimatter?

T. Bell 5

Dark Matter and Dark Energy?

TTWe do not know the

composition of 95% of the universe

Temperature of the universeWMAP satellite

T. Bell 6

Blue tubes contain the two beam pipes and magnets at 1.8 degrees Kelvin

T. Bell 7

ATLAS detector during construction in 2005

T. Bell 8

Number of candidates (vertical axis)

Mass of the candidates(horizontal axis)

We observe an excess of candidates with a mass of 125 proton-

masses

Search for Higgs decays to 4 “leptons” (electrons or muons)

Also observed in the CMS experiment

T. Bell 9

July 4, 2012

The Worldwide LHC Computing Grid

Tier-1: permanent storage, re-processing, analysis

Tier-1: permanent storage, re-processing, analysis

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-2: Simulation,end-user analysisTier-2: Simulation,end-user analysis

> 2 million jobs/day> 2 million jobs/day

~250’000 cores~250’000 cores

173 PB of storage173 PB of storage

nearly 160 sites, 35 countries

nearly 160 sites, 35 countries

10 Gb links10 Gb links

Tier-1: permanent storage, re-processing, analysis

Tier-0 (CERN): data recording, reconstruction and distribution

Tier-2: Simulation,end-user analysis

> 2 million jobs/day

~250’000 cores

173 PB of storage

nearly 160 sites, 35 countries

10 Gb links

WLCG:An International collaboration to distribute and analyse LHC data

Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicists

WLCG:An International collaboration to distribute and analyse LHC data

Integrates computer centres worldwide that provide computing and storage resource into a single infrastructure accessible by all LHC physicistsT. Bell 10

IT Infrastructure Challenges

Staff numbers fixed Materials budget decreasing Increasing users of CERN’s facilities Legacy tools are high maintenance and brittle Additional data centre in Budapest now online

doubling potential capacity and 200GBit/s network

How do we scale from our current 11,000 servers within these constraints ?

T. Bell 11

Approach

Remodel IT services on Cloud layered models IaaS, PaaS, SaaS

Move to commonly used open source tools Puppet,OpenStack,Foreman,Koji,Oz,Kibana, …

Implement clouds at scale IT aims for 15,000 hypervisors with 150,000 VMs

by 2015 Exploit ecosystem solutions such as LBaaS,

DBaaS, MQaaS rather than build our own

T. Bell 12

Clouds in High Energy Physics

T. Bell 13

Long-term preservation of software and data of

HEP experiments

Utilize special computing resources

attached to the detectors

Simplify the management of heterogeneous in-

house resources

Use commercial clouds for exceptional

computing demands

Distributed cloud computing using HEP and non-HEP clouds

Service Models

T. Bell 14

Pets are given names like pussinboots.cern.ch

They are unique, lovingly hand raised and cared for

When they get ill, you nurse them back to health

Cattle are given numbers like vm0042.cern.ch

They are almost identical to other cattle When they get ill, you get another one

Future application architectures tend towards Cattle but Pet support is needed for some specific zones of the cloud

Refine Service Levels ?

T. Bell 15

Hippos are cattle with bulk storage. Useful where Cassandra or MongoDBensures redundancy

Canaries are cattle at high risk to give early warning of failures .. Deploy early, fail fast and fix

Infrastructure Overview

T. Bell 16

Microsoft Active Directory

CERN DB on Demand

CERN Network Database

Account mgmt. system

Horizon

Keystone

NetworkCompute

Glance

Scheduler

Cinder

Nova

CERN Block Storage provider

Dashboard using Horizon

T. Bell 17

Timelines

Deploy as stable release becomes available in EPEL

Keep up to date but not too close Benefit from continuous integration testing of

other companies

T. Bell 18

Grizzly

' 12 Jan2013 Feb Apr May … Oct Dec ' 13

Today HavanaOct, 2013

Havana ServiceNov/Dec, 2013

Apr 4, 2013

Grizzly ServiceMay, 2013

IbexFeb, 2013

FolsomSep 27, 2012

Status CERN IT OpenStack Cloud

Running Folsom around 500 hypervisors on KVM and Hyper-V

High availability using load balancing 75 users creating around 50 new VMs/day

Experiment farms CMS currently running 1,300 hypervisors with

50,000 cores using Essex ATLAS starting to ramp up to a similar size

Other HEP sites moving to private cloud Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP,

…T. Bell 19

Next Steps (I)

Move to Grizzly Target end May 2013

Enable Kerberos and X.509 authentication Avoids users having to enter passwords

Recycle existing hardware and scale using cells Can recycle around 100 batch machines to

hypervisors/week

T. Bell 20

Cells

T. Bell 21

We’re not alone …

T. Bell 22

Already 6 sites running more than 10,000 hypervisors according to the latest OpenStack user survey

Next Steps (II) Block Storage for Hippos and Pets

Cinder with Ceph, NetApp or GlusterFS Heat for Orchestration and auto-scaling Load Balancing as a Service Bare-Metal to bring all servers under

OpenStack Move ceilometer into production

Accounting by project Move to wall-clock, vCPU metering

T. Bell 23

Cost Model CERN computing is funded from CERN central

budgets, no billing but quotas

T. Bell 24

IT resource manager

Experiment resource managers

Project Management

Quota Management

What to do when quota is exceeded ? No credit card

If capacity is not used ? Spot market on low SLA conditions

Fair share across the cloud ? Worked for supercomputers but heavy for clouds

at scale Bursting to public clouds an option ?

IT provisioned or experiment decision

T. Bell 25

Cloud of clouds: the next big step What is required to get to a cloud of clouds ?

Federated identity Image conversion and sharing API standardisation SLAs Security models

Many initiatives investigating this at different levels Public/Private bursting Private/Private sharing (as the grid) Homogeneous and Heterogeneous

We will see intensive efforts in this area over the coming year

T. Bell 26

Conclusions

Clouds provide a framework for re-engineering how IT is delivering responsive services to the physicists

OpenStack and the ecosystem provide a suitable solution with flexibility and opportunity to contribute as well as benefit from work of others

Migration via re-cycling bare-metal to hypervisors provides a smooth transition

Cloud of clouds has potential to replace grid computing models in the future

T. Bell 27

Questions?Questions?

T. Bell 28

BACKUP SLIDES

Job Opportunities

T. Bell 30

Science is getting more and more global

CERN: x staff, x fellows

T. Bell 31

top related