tier 1 (grid) services

7
Tier 1 (Grid) Services Ian Collier GridPP Review June 20 th 2012

Upload: shandi

Post on 07-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Tier 1 (Grid) Services. Ian Collier GridPP Review June 20 th 2012. Past Year. EMI Updates Migration off gLite to EMI(2) Formally engaged with Staged Rollout & Early Adopters process Virtualisation (Nearly) all services on (Hyper-V) virtualised platform - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tier 1 (Grid) Services

Tier 1 (Grid) Services

Ian Collier

GridPP Review

June 20th 2012

Page 2: Tier 1 (Grid) Services

Past Year• EMI Updates

– Migration off gLite to EMI(2)– Formally engaged with Staged Rollout & Early Adopters

process• Virtualisation

– (Nearly) all services on (Hyper-V) virtualised platform– Much easier to set up & manage than collection of bare metal– Quick recovery after power events notable

• CVMFS Stratum 0 for non-LHC Vos– Actively used now– Responses have been very positive– WeNMR latest, enthusiastic, users

Page 3: Tier 1 (Grid) Services

Operational Issues• Batch start rates

– Limited– Have been testing alternatives to torque/maui– Condor & SLURM frontrunners

• Condor looking very good• Have been hitting scaling limits with SLURM

– As side effect also looking at ARC CE– Ne step: test with half of the retiring 2007 WNs on SL6 with new

Condor & ARC CE• (cvmfs) Job timeout failures

– Low but persistent rate (~5% varying)– Have been testing 2.1.x client

• Found much worse problems

– Investigation continuing

Page 4: Tier 1 (Grid) Services

Coming Year• Continue Updates

– Starting on EMI-3– Further Staged Rollout & Early adoption– Complete SL6 migrations

• Virtualisation– Shared storage just coming on-line

• Investigations to make full use of that• Replication between buildings, etc.

• Distribute services– Between R89 & Atlas ‘outpost’ as it develops– ie BDIIs, FTS’, CEs, etc., spread between 2 buildings

• CVMFS Stratum 0 – Erasmus project to build web interface for SW upload– Negotiating for sites to replicate

• Reference architecture may be different from WLCG• EGI have picked coordinating network of repositories & replicas• Nikhef & OSG, maybe CERN

Page 5: Tier 1 (Grid) Services

Configuration Management• Quattor working well

– Although we benefit from QWG, we could do so more– Made some ‘expedient’ choices early on – ready to revisit

now

• Quattor community more active recently– No longer held back by backward compatibility for CERN

• Migration to Aquilon– Opportunity to refactor– Will allow more automation– Will improve workflows.

• Of course track other activities& developments

Page 6: Tier 1 (Grid) Services

Cloud• SCD Cloud

– Concept well proven– ~300 cores, 90-95% use– Adding half of 2007 WNs– Member of staff (not rotating graduate) in plan

• Storage– Have small ceph cluster to deploy

• Image store• Object (S3) store - service

• Active use cases:– Internal (Tier1 & SCT) development & testbeds

• High level of user trust

• Developing Use cases– Other users in STFC (ISIS, RAL Space)– EGI, GridPP & WLCG Cloud work

Page 7: Tier 1 (Grid) Services

Looking to Future

Starting to think about:• Post GridPP 4

• Cloud is great for ‘disposable’ resources– What would it take for us to consider it to be

solid enough for services now on Hyper-V?– What about layer (& interface) in batch

farm?