archiving data from durham to ral using the file transfer service (fts)

26
Lydia Heck, Campus network engineering workshop 19/10/2016 Archiving data from Durham to RAL using the File Transfer Service (FTS)

Upload: jisc

Post on 09-Jan-2017

209 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Lydia Heck, Campus network engineering workshop19/10/2016 Archiving data from Durham to RAL using the

File Transfer Service (FTS)

Page 2: Archiving data from Durham to RAL using the File Transfer Service (FTS)

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

2

Archiving data from Durham to RAL using the File Transfer

Service (FTS)Lydia Heck

Institute for Computational CosmologyManager of the DiRAC-2/2.5 Data Centric Facility

COSMA

Page 3: Archiving data from Durham to RAL using the File Transfer Service (FTS)

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

3

Introduction to DiRACl DIRAC -- Distributed Research utilising Advanced

Computing established in 2009 with DiRAC-1 • Support of research in theoretical astronomy, particle physics and

nuclear physics

• Funded by STFC with infrastructure money allocated from the Department for Business, Innovation and Skills (BIS)

• The running costs, such as staff costs and electricity are funded by STFC

• DiRAC is classed as a major research facility by STFC on a par with the big telescopes

Page 4: Archiving data from Durham to RAL using the File Transfer Service (FTS)

What is DiRACl A national service run/managed/allocated by the

scientists who do the science funded by BIS and STFC

l The systems are built around and for the applications with which the science is done.

l We do not rival a facility like ARCHER, as we do not aspire to run a general national service.

19 October 2016 4Campus Network Engineering for Data Intensive Science Workshop

Page 5: Archiving data from Durham to RAL using the File Transfer Service (FTS)

What is DiRAC – cont’d?

l For the highlights of science carried out on the DiRAC facility please see: http://www.dirac.ac.uk/science.html

l Specific example: Large scale structure calculations with the Eagle run

4096 cores ~8 GB RAM/core 47 days = 4,620,288 cpu hours 200 TB of data

19 October 2016 5Campus Network Engineering for Data Intensive Science Workshop

Page 6: Archiving data from Durham to RAL using the File Transfer Service (FTS)

The DiRAC computing systems

19 October 2016 6Campus Network Engineering for Data Intensive Science Workshop

Blue GeneEdinburgh

CosmosCambridge

ComplexityLeicester

Data CentricDurham

Data AnalyticCambridge

Page 7: Archiving data from Durham to RAL using the File Transfer Service (FTS)

COSMA @ DiRAC (Data Centric) Durham – Data Centric

system –IBM IDataplex 6720 Intel Sandy Bridge

cores 53.8 TB of RAM FDR10 infiniband 2:1

blocking 2.5 Pbyte of GPFS

storage (2.2 Pbyte used!)

19 October 2016 7Campus Network Engineering for Data Intensive Science Workshop

Page 8: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Resources of DiRACl Long projects with significant amount of CPU hours allocated

for 3 years typically on a specific system on one or more of the available 5 systems. Resources available:

l l l l

l

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

8

System cpu hours storage locationBluegene 98,304 cores 861 M 1 PB (GPFS) Edinburgh

Data Centric 6720 Xeon cores

59 M 2.5 PB (GPFS) Durham (DiRAC2)

Data Centric 8000 Xeon cores

> 71 M 2.5 PB data (Lustre)1.8 PB scratch (Lustre)

Durham (DiRAC2.5)

Complexity 4352 Xeon cores

38 M 0.8 PB (Panasas) Leicester

Data Analytic 4800 Xeon cores

42 M 0.75 PB (Lustre) Cambridge

SMP 1784 Xeon cores shared memory

15.6M 146 TB (EXT) Cambridge

Page 9: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Why do we need to copy data ? During and when a project is completed copy data to home institutions

l requires additional storage resource at researchers’ home institutionsl Not enough provision – will require additional funds.

Make backup copiesl if disaster struck many cpu hours of calculations would be lost.

Copy data to other sites to leverage compute resources for post processing.Storage on HPC facility runs out of capacity data creation considerably above expectation ?

l

19 October 2016 9Campus Network Engineering for Data Intensive Science Workshop

Page 10: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Why do we copy data to RAL ?

Research data must now be available to interested parties for specified period of timel We could install DiRAC's own archive

• requires funds and there is (currently) no budgetWe needed to get started:

l to gain experiencel to get a valid backupl to remove data as the resources run outl Identify bottlenecks and technical challenges

Jeremy Yates (Director of DiRAC) negotiated access to the RAL archiving systems

Set up collaborations and make use of previous experience and pool resources

AND: copy data!l

l l

19 October 2016 10Campus Network Engineering for Data Intensive Science Workshop

Page 11: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Network connectivity of Durham University

• 2012 – upgrade to 4x1 Gbit to Janet• Janet advised to investigate optimal utilisation of

available bandwith before applying for further upgrade

• 2014 – upgrade to 6 Gbit to Janet

• currently: 8 Gbit to Janet should be a full 10 Gbit by the end of the year – technical issues

19 October 2016 11Campus Network Engineering for Data Intensive Science Workshop

Page 12: Archiving data from Durham to RAL using the File Transfer Service (FTS)

network bandwidth – situation for Durham

l 2014: Measured throughput ?l l

19 October 2016 12Campus Network Engineering for Data Intensive Science Workshop

Page 13: Archiving data from Durham to RAL using the File Transfer Service (FTS)

2014: Measured Limits ?l l

19 October 2016 13Campus Network Engineering for Data Intensive Science Workshop

Page 14: Archiving data from Durham to RAL using the File Transfer Service (FTS)

September 2014 – Measured limits l l

19 October 2016 14Campus Network Engineering for Data Intensive Science Workshop

Page 15: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Making optimal use of available bandwidth

• planning and investment to by-pass the external campus firewall:• Prepartory work started in October/November 2014 two new

routers (~£80k) – configured for throughput with minimal ACL enough to safeguard site.

• deploying internal firewalls – part of new security infrastructure anyhow but essential for such a venture

• security now relies on front-end systems of Durham DiRAC and Durham GridPP

• IPPP was moved outside the firewall in April 2015 with a clear mandate to manage security for their installation.

• The DiRAC Data Transfer system was moved outside about 1 month later.

19 October 2016 15Campus Network Engineering for Data Intensive Science Workshop

Page 16: Archiving data from Durham to RAL using the File Transfer Service (FTS)

GridPP Site FW config for endpoint node

19 October 2016 16Campus Network Engineering for Data Intensive Science Workshop

GridFTPPort

blockingGridFTP

Pass thru

GridFTP

GridFTP

Monitor w/fw

GridFTPBypass site fw

Page 17: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Result for DiRAC and GridPP in Durham

• guaranteed 3 Gbit/sec in/out• Consequences:

• pushed the network performance for Durham GridPP from bottom 3 in the country to top 5 of the UK GridPP sites

• Now they experience different bottlenecks, but they under their control

• DiRAC data transfers achieve up to 300 – 400 Mbyte/sec throughput to RAL on archiving depending on file sizes.

• faster data sharing with other collaboration sites

• recently (October 2016) offered service to Earth Sciences with 70-80 MByte/sec from site in Switzerland

19 October 2016 17Campus Network Engineering for Data Intensive Science Workshop

Page 18: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Collaboration between DiRAC and GridPP/RAL

l Durham Institute for Computational Cosmology (ICC) volunteered to be the prototype installation

l Huge thanks to Jens Jensen and Brian Davies - there were many emails exchanged, many questions asked and many answers given.

l Resulting document “Setting up a system for data archiving using FTS3” by Lydia Heck, Jens Jensen and Brian Davies

19 October 2016 18Campus Network Engineering for Data Intensive Science Workshop

l https://www.cosma.dur.ac.uk/documentation

Page 19: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Setting up the archiving tools

l Identify appropriate hardware – could mean extra expense:

need freedom to modify and experiment with cannot have HPC users logged in and working

when you need to reboot the system!l free to do very latest security updates

This might not always be possible on an HPC system

l requires optimal connection to storage For the transfer system this meant an infiniband

card19 October 2016 19Campus Network Engineering for Data

Intensive Science Workshop

Page 20: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Setting up the archiving tools

l Create an interface to access the file/archving service at RAL using the GridPP tools• gridftp – Globus Toolkit – also provides Globus

Connect

• Trust anchors (egi-trustanchors)

• voms tools (emi3-xxx)

• fts3 (cern)

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

20

Page 21: Archiving data from Durham to RAL using the File Transfer Service (FTS)

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

21

Chose to use FTS3 with GridFTP

User submits transfer lists

(and credentials)

GPFS

data.cosma.dur.ac.uk(GridFTP)

CASTOR-GEN

srm-dirac.gridpp.rl.ac.uk(SRM)

GridFTP

FTS3

Page 22: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Learning to use certificates and proxies l long-lived voms proxy?

l myproxy-init; myproxy-logon; voms-proxy-init; fts-transfer-delegation

l How to create a proxy and delegation that lasts weeks even months?

l This is still an issue for a voms proxy. But circumvented it using normal proxy.

l grid-proxy-init; fts-transfer-delegationl grid-proxy-init –valid HH:MMl fts-transfer-delegation –e time-in-seconds l creates proxy that lasts up to certificate life time.

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

22

Page 23: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Experiences

1. Large files – optimal throughput limited by network bandwidth

2. Many small files – limited by latency

3. many parallel sessions: impedes on proper functioning of archive server.

4. Ownership, creation dates not preserved – one grid owner

5. Simple approach of “just” pushing files will not work!

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

23

Page 24: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Actions to overcome issues

• tar files up in chunks - ~256 Gbyte • exclude checked out versioning subdirectories• preserves ownership, and time stamps in the tar archive• keep record of archived files

• Files to transfer are large – limited by bandwidth, not by latency

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

24

Page 25: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Open issues

l depends on single admin to carry out. Not automatic.

l what happens when content in directories change? – complete new archive sessions?

l Create a tool more like rsync – requires extensive scripting

l When trying to get data back, get back all of a subset, to find single or string of files

19 October 2016 Campus Network Engineering for Data Intensive Science Workshop

25

Page 26: Archiving data from Durham to RAL using the File Transfer Service (FTS)

Conclusionsl With the right network speed we can archive the DiRAC data to

RAL or anywhere else with the right tools and connectivity.l Documenting the procedure is very important to transfer the

knowledge and duplicating effort. The documentation is online https://www.cosma.dur.ac.uk/documentation

l Each DiRAC site should have their own dirac0X accountl Start with and keep on archiving – this is more difficult as it is

not completely automatic yet and more development is required.

l Collaboration between DiRAC and GridPP/RAL DOES work!l The work has been of benefit to other transfer actions, which

significantly helps research and reflects well on the service we can deliver.

l Can we aspire to more? 19 October 2016 Campus Network Engineering for Data

Intensive Science Workshop26