hepix autumn meeting 2014 university of nebraska, lincoln 2 arne wiebalck liviu valsan borja...

Post on 21-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2

HEPiX Autumn Meeting 2014University of Nebraska, Lincoln

http://indico.cern.ch/event/320819/

Arne Wiebalck

Liviu Valsan

Borja Aparicio

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

3

HEPiX

• Global organization of service managers and support staff providing computing facilities for HEP community

• Participating sites include BNL, CERN, DESY,

FNAL, IN2P3, INFN, NIKHEF, RAL, TRIUMF …

• Meetings are held twice per year- Spring: Europe, Autumn: U.S./Asia

• Reports on status and recent work, work in progress & future plans

- Usually no showing-off, honest exchange of experiences

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

4

Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization

• Storage and File systems• Computing and Batch• IT Facilities

• Networking and Security• Basic IT Services

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Arne

Liviu

Borja

5

HEPiX Autumn 2014• Oct 13 – 17, 2014 at the

University of Nebraska Lincoln- Well organized, rich program

- Eduroam, Indico (intervention, incident, power cut)

• 93 registered participants- Many first timers again

- 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites

- 45 sites represented

• 60 contributions- 96 slides (in 25 minutes!)

- 300 words per slide …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

6Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Lincoln, Nebraska

About 22 hours door to door …

7

HEPiX Autumn 2014• Oct 13 – 17, 2014 at the

University of Nebraska Lincoln- Well organized, rich program

- Eduroam, Indico (intervention, incident, power cut)

• 93 registered participants- Many first timers again

- 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites

- 45 sites represented

• 60 contributions- 96 slides (in 25 minutes!)

- 300 words per slide …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

8

HEPiX News

• Tony Wong (BNL) new HEPiX co-chair- 3-year term

• Next meetings- Spring 2015: Oxford (UK) March 23 – 27

- Autumn 2015: BNL (US) Oct 12 – 16

- Spring 2016: DESY Zeuthen (DE), Berlin/Potsdam (TBC)

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

9

HEPiX Working Groups• IPv6

- Deployment/readiness following Tier structure

https://www.gridpp.ac.uk/wiki/2014_IPv6_WLCG_Site_Survey

- Experiments pushing for services at T1/T2

• Benchmarking- Awaiting SPEC CPUv6

- Suggestion of a “fast” benchmark (minutes)

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

10

Site Reports• 15 site reports: T0, 7x T1s, 7x T2s

• (Move to) HTCondor still very visible - Talk from HTCondor team - INFN (on LSF now) will start evaluation

• KIT’s “Dropbox”: bwSync&Share- 8’000 users- Based on PowerFolder

• Ganeti used at multiple sites- VM cluster management tool from Google- Overall positive experience

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

11

Site Reports• Ceph

- Still gaining momentum: many PoCs (RAL: 1PB, BNL: 3PB)- Vivid mail exchange, BoF Session in Oxford?

• Energy efficiency- No WG, but many activities (refurbishments)

- “Energy accounting” discussions

• INFN still investigating micro-server options- Moonshot and other Avoton based solutions

- Experiments seem fine with performance/power ratio

• During “dark data” cleanup NDGF deleted all ALICE tape data due to misunderstanding of what “NDGF data” means

- ALICE::NDGF vs. ALICE::NDGF_tape

- 200TB of data now being backfilled …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

12

CERN Site Report

• “What about Ceph @ CERN?”

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

13

CERN Site Report

• “What about Ceph @ CERN?”

• “Are there ever power cuts at CERN?”

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

14

End User Services & OS• Six talks in total, three from CERN

- Thomas: CC7

- Borja: Issue tracking and VCS

- Michail: FTS3

• Scientific Linux / CentOS - FNAL SL team continue to provide Scientific Linux

- No competition with other rebuilds

- Rebuild from git.centos.org: difficult (as not supported)

So, after the initial discussions at the Annecy

meeting, the community seems to part ways …

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

15

Virtualization• Six talks in total, five from CERN

- Laurence: Experiment’s Cloud Computing Adoption

- Andrea: WLCG Monitoring

- Helge: Volunteer Computing

- Arne: Cloud Report, VM IO Performance

• RAL starting batch virtualization- “Burst batch into the cloud”

- Successful PoC: Vacuum model integration with HTCondor

• Virtualization @ GSI: MS Windows on KVM- Windows domain restructuring: all on VMs, all on KVM

- Partly in prod (CA, TS), partly in testing (DC, Exchange)

- No support issue

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

16

Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization

• Storage and File systems• Computing and Batch• IT Facilities

• Networking and Security• Basic IT Services

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Arne

Liviu

Borja

17

Storage and Filesystems Ten talks in total, five from CERN:

–Luca:– EOS across 1000 km– CERNbox + EOS: Cloud Storage for Science

–Andrea: DPM performance tuning hints for HTTP/WebDAV and Xrootd

–Ruben: Experience in running relational databases on clustered storage

–Liviu: SSD Benchmarking at CERN

https://lvalsan.web.cern.ch/lvalsan/ssd_benchmarking

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

18

OpenZFS on Linux OpenZFS

Large set of features Independent of the Linux kernel

LLNL: Three Lustre filesystems, ~100 PB, OpenZFS

backend Moving to commodity JBODs Work ongoing for improving Linux boot time with

large number of drives

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

19

Ceph Based Storage Systemsfor RACF Deployment of same scale as at CERN Lots of performance and stability tests

Object storage, block storage and file system (Ceph FS)

On several platforms (including HP Moonshot) Different networking solutions

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

20

Using XRootD to Minimize Hadoop Replication Hadoop replication via XRootD Reduced local Hadoop replication to 1 In case of corrupt local blocks:

Request blocks via XRootD Cache locally Repair broken blocks locally in Hadoop

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

21

Computing and Batch Systems

Six talks in total, one from CERN: Two presentations on benchmarking Four presentations on batch systems

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

22

Benchmarking activities Intel Xeon E5-2600 v3 (Haswell)

Showing good performance Intel Avoton: very good HS06 / Watt ratio ARM 32-bit HS06 / Watt in between Xeon &

Avoton

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

23

Fast Benchmark Some requirements are clear:

Open source Easy to run Small

Others requirements not so clear: How fast? Reproducible? Reliable? Single core or multicore?

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

24

Fast Benchmark Proposals Geant4 based

Linux x86-64 & ARM Realistic detector geometry Footprint: 1/4 to 1/3 of real experiment CPU bound, no I/O

LHCb fast benchmark Small python script, single threaded

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

25

Next generation HEP-SPEC06

Next SPEC CPU benchmark (CPUv6) in beta

Should be released before the end of the year

Will probably not run with the default SLC 6 compiler

Gcc on CentOS 7 should be fine, config file will be provided by GridKa

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

26

Batch Systems All four talks about HTCondor:

Two talks from developers Jérôme’s talk: HTCondor pilot @ CERN Open Science Grid adopting HTCondor

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

27

IT Facilities and Business Continuity

Three talks, two from CERN First Experience with the Wigner Data Centre Joint procurement of IT equipment and services

UPS Monitoring with Sensaphone Multi-level email / SMS alerting Gradual shutdown of servers in case of power cut or

cooling failure Wireless temperature sensors used to build 3D heatmap

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

28

NeRSC New Computational Research and Theory

(CRT) Building Year-round free air

and water cooling PUE < 1.1 42 MW to building

12.5 MW provisioned

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

29

Outline • 2014 Autumn Meeting & HEPiX News• Site Reports• End User Services & OS• Grids, Clouds, and Virtualization

• Storage and File systems• Computing and Batch• IT Facilities

• Networking and Security• Basic IT Services

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

Arne

Liviu

Borja

30

Networking and Security

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

• Four networking talks, two security, one from CERN- Stefan: Situational Awareness: Computer Security

• IPv6 Deployment- HEPiX Ipv6 Working Group: WLCG dual-stack services deployment. Testing

- Open Sciences Grid: Client/Server are dual-stack? Server is but not the client?

• Infiniband Based Networking evaluation- Brookhaven National Laboratory (USA)

https://indico.cern.ch/event/320819/session/4/contribution/46/material/slides/0.pdf

• ESNet: Extension to Europe- US Department of Energy

- “Scientific progress will be completely unconstrained by the physical location of instruments, people, computational resources or data”

31

Basic IT Services 1/2

• Seven talks, three from CERN- Ben: Configuration Services at CERN: Update

- Rubén: Database on Deman: insight how to build your DbaaS

- Aris: Ermis service for DNS Load Balancer configuration

• Monitoring with Nagios- NERSC – US Department of Energy

- Monitoring clusters of 1000's of compute nodes

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

32

Basic IT Services 2/2• CFEngine

- ATLAS Great Lakes Tier 2 (AGLT2)

- Change management: SVN → Push to production

• Puppet at USCMS-T1 – FermiLab- Modules + Data in Hiera approach. PuppetDashboard instead of TheForeman

- Change management: Git branches → Push to production

- Continuous Integration? Not yet but Beaker is the main candidate

- Secrets? “hiera-eyaml” Not a good solution

• Puppet at BNL- RICH and ATLAS computing Facility

- Emphasis in Change Management and Cultural Management

- Test environments + self-approve delay

- Looking for automatic testing

Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

top related