copyright aarnet 20051 massive data transfers george mclaughlin mark prior aarnet

16
Copyright AARNet 2005 1 Massive Data Transfers George McLaughlin Mark Prior AARNet

Upload: barbara-hawkins

Post on 03-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Copyright AARNet 20051

Massive Data Transfers

George McLaughlinMark PriorAARNet

Copyright AARNet 20052

“Big Science” projects driving networks• Large Hadron Collider

– Coming on-stream in 2007– Particle collisions generating terabytes/second of

“raw” data at a single, central, well-connected site– Need to transfer data to global “tier 1” sites. A tier

1 site must have a 10Gbps path to CERN– Tier 1 sites need to ensure gigabit capacity to the

Tier2 sites they serve• Square Kilometre Array

– Coming on-stream in 2010?– Greater data generator than LHC– Up to 125 sites at remote locations, data need to

be brought together for correlation– Can’t determine “noise” prior to correlation – Many logistic issues to be addressed

Billion dollar globally funded projects

Massive data transfer needs

Copyright AARNet 20053

From very small to very big

Copyright AARNet 20054

Scientists and Network Engineers coming together

• HEP community and R&E network community have figured out mechanisms for interaction – probably because HEP is pushing network boundaries

• eg the ICFA workshops on HEP, Grid and the Global Digital Divide bring together scientists, network engineers and decision makers – and achieve results

• http://agenda.cern.ch/List.php

Copyright AARNet 20055

What’s been achieved so far A new generation of real-time Grid systems is

emerging - support worldwide data analysis by the physics community

Leading role of HEP in developing new systems and paradigms for data intensive science

Transformed view and theoretical understanding of TCP as an efficient, scalable protocol with a wide field of use

Efficient standalone and shared use of 10 Gbps paths of virtually unlimited length; progress towards 100 Gbps networking

Emergence of a new generation of “hybrid” packet- and circuit- switched networks

Copyright AARNet 20056

LHC data (simplified) Per experiment40 million collisions per second

• After filtering, 100 collisions of interest per second

• A Megabyte of digitised information for each collision = recording rate of 100 Megabytes/sec

• 1 billion collisions recorded = 1 Petabyte/year

CMS LHCb ATLAS ALICE

1 Megabyte (1MB)A digital photo

1 Gigabyte (1GB) = 1000MB

A DVD movie

1 Terabyte (1TB)= 1000GB

World annual book production

1 Petabyte (1PB)= 1000TB

10% of the annual production by LHC experiments

1 Exabyte (1EB)= 1000 PB

World annual information production

Copyright AARNet 20057

LHC Computing Hierarchy

Tier 1

Tier2 Center

Online System

CERN Center PBs of Disk;

Tape Robot

FNAL CenterIN2P3 Center INFN Center RAL Center

InstituteInstituteInstituteInstitute

Workstations

~100-1500 MBytes/sec

2.5-10 Gbps

Tens of Petabytes by 2007-8.An Exabyte ~5-7 Years later.

~PByte/sec

~2.5-10 Gbps

Tier2 CenterTier2 CenterTier2 Center

~2.5-10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1

0.1 to 10 GbpsPhysics data cache

Copyright AARNet 20058

Lightpaths for Massive data transfers• From CANARIE

0

5

10

15

20

25

30

Lightpaths

IP Peak

IP Average

A small number of users with large data transfer needs can use more bandwidth than all other users

Copyright AARNet 20059

Why?• Cees de Laat classifies network

users into 3 broad groups. 1. Lightweight users, browsing, mailing,

home use. Who need full Internet routing, one to many;

2. Business applications, multicast, streaming, VPN’s, mostly LAN. Who need VPN services and full Internet routing, several to several + uplink; and

3. Scientific applications, distributed data processing, all sorts of grids. Need for very fat pipes, limited multiple Virtual Organizations, few to few, peer to peer.

Type 3 users:

High Energy PhysicsAstronomers, eVLBI,High Definition multimedia over IPMassive data transfers from experiments running 24x7

Copyright AARNet 200510

What is the GLIF?• Global Lambda Infrastructure Facility

- www.glif.is• International virtual organization that

supports persistent data-intensive scientific research and middleware development

• Provides ability to create dedicated international point to point Gigabit Ethernet circuits for “fixed term” experiments

Copyright AARNet 200511

Huygens Space Probe – a practical example• Cassini spacecraft left Earth in October 1997

to travel to Saturn• On Christmas Day 2004, the Huygens probe

separated from Cassini• Started it’s descent through the dense

atmosphere of Titan on 14 Jan 2005• Using this technique 17 telescopes in

Australia, China, Japan and the US were able to accurately position the probe to within a kilometre (Titan is ~1.5 billion kilometres from Earth)

• Need to transfer Terabytes of data between Australia and the Netherlands

Very Long Baseline Interferometry (VLBI) is a technique where widely separated radio-telescopes observe the same region of the sky simultaneously to generate images of cosmic radio sources

Copyright AARNet 200512

AARNet - CSIRO ATNF contribution• Created “dedicated” circuit• The data from two of the Australian

telescopes (Parkes [The Dish] & Mopra) was transferred via light plane to CSIRO Marsfield (Sydney)

• CeNTIE based fibre from CSIRO Marsfield to AARNet3 GigaPOP

• SXTransPORT 10G to Seattle• “Lightpath” to Joint Institute for VLBI

in Europe (JIVE) across CA*net4 and SURFnet optical infrastructure

Copyright AARNet 200513

But………..• 9 organisations in 4 countries involved in

“making it happen” • Required extensive human-human

interaction (mainly emails…….lots of them) • Although a 1Gbps path was available,

maximum throughput was around 400Gbps• Issues with protocols, stack tuning, disk-to-

disk transfer, firewalls, different formats, etc• Currently scientists and engineers need to

test thoroughly before important experiments, not yet “turn up and use”

• Ultimate goal is for the control plane issues to be transparent to the end-user who simply presses the “make it happen” icon

Although time from concept to undertaking the scientific experiment was only 3 weeks……..

Copyright AARNet 200514

International path for Huygens transfer

Copyright AARNet 200515

EXPReS and Square Kilometre Array• SKA bigger data generator than LHC• But in a remote location

Australia one of countries bidding for SKA – significant infrastructure challenges

Also, Eu Commision funded EXPReS project to link 16 radio telescopes around the world at gigabit speeds

Copyright AARNet 200516

In Conclusion

• scientists and network engineers working together can exploit the new opportunities that high capacity networking opens up for “big science”

• Need to solve issues associated with scalability, control plane, ease of use

• QUESTIONS?