tier 1 (grid) services
DESCRIPTION
Tier 1 (Grid) Services. Ian Collier GridPP Review June 20 th 2012. Past Year. EMI Updates Migration off gLite to EMI(2) Formally engaged with Staged Rollout & Early Adopters process Virtualisation (Nearly) all services on (Hyper-V) virtualised platform - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/1.jpg)
Tier 1 (Grid) Services
Ian Collier
GridPP Review
June 20th 2012
![Page 2: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/2.jpg)
Past Year• EMI Updates
– Migration off gLite to EMI(2)– Formally engaged with Staged Rollout & Early Adopters
process• Virtualisation
– (Nearly) all services on (Hyper-V) virtualised platform– Much easier to set up & manage than collection of bare metal– Quick recovery after power events notable
• CVMFS Stratum 0 for non-LHC Vos– Actively used now– Responses have been very positive– WeNMR latest, enthusiastic, users
![Page 3: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/3.jpg)
Operational Issues• Batch start rates
– Limited– Have been testing alternatives to torque/maui– Condor & SLURM frontrunners
• Condor looking very good• Have been hitting scaling limits with SLURM
– As side effect also looking at ARC CE– Ne step: test with half of the retiring 2007 WNs on SL6 with new
Condor & ARC CE• (cvmfs) Job timeout failures
– Low but persistent rate (~5% varying)– Have been testing 2.1.x client
• Found much worse problems
– Investigation continuing
![Page 4: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/4.jpg)
Coming Year• Continue Updates
– Starting on EMI-3– Further Staged Rollout & Early adoption– Complete SL6 migrations
• Virtualisation– Shared storage just coming on-line
• Investigations to make full use of that• Replication between buildings, etc.
• Distribute services– Between R89 & Atlas ‘outpost’ as it develops– ie BDIIs, FTS’, CEs, etc., spread between 2 buildings
• CVMFS Stratum 0 – Erasmus project to build web interface for SW upload– Negotiating for sites to replicate
• Reference architecture may be different from WLCG• EGI have picked coordinating network of repositories & replicas• Nikhef & OSG, maybe CERN
![Page 5: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/5.jpg)
Configuration Management• Quattor working well
– Although we benefit from QWG, we could do so more– Made some ‘expedient’ choices early on – ready to revisit
now
• Quattor community more active recently– No longer held back by backward compatibility for CERN
• Migration to Aquilon– Opportunity to refactor– Will allow more automation– Will improve workflows.
• Of course track other activities& developments
![Page 6: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/6.jpg)
Cloud• SCD Cloud
– Concept well proven– ~300 cores, 90-95% use– Adding half of 2007 WNs– Member of staff (not rotating graduate) in plan
• Storage– Have small ceph cluster to deploy
• Image store• Object (S3) store - service
• Active use cases:– Internal (Tier1 & SCT) development & testbeds
• High level of user trust
• Developing Use cases– Other users in STFC (ISIS, RAL Space)– EGI, GridPP & WLCG Cloud work
![Page 7: Tier 1 (Grid) Services](https://reader036.vdocuments.us/reader036/viewer/2022083006/56813e74550346895da88c7f/html5/thumbnails/7.jpg)
Looking to Future
Starting to think about:• Post GridPP 4
• Cloud is great for ‘disposable’ resources– What would it take for us to consider it to be
solid enough for services now on Hyper-V?– What about layer (& interface) in batch
farm?