[email protected] torrent-based software distribution in alice

13
[email protected] Torrent-based Software Distribution in ALICE

Upload: darcy-harper

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Cost in .Gr igoras@cern .ch

Torrent-based Software Distribution in ALICE

Torrent-based software distribution in ALICE

2

Outline

GDB, Annecy 10.10.2012

MotivationHow it worksSite requirementsHistoryMigration status

Torrent-based software distribution in ALICE

3

Motivation

GDB, Annecy 10.10.2012

ALICE was using site shared areas for installing the pre-compiled experiment software packages

Large sites suffered from AFS/NFS/… scalability issues and being a single point of failure

Large space needed for the many active versionsOld model needed a site local service to manage the

installation, unpacking and deletion of the packagesRequirement for strict site configuration to support

operation – excludes use of ‘opportunistic’ resources/centres

From the very beginning, the shared SW area and its access from the VO-box was considered a security risk

All of the above and more are solved by the use of the Torrent protocol to distribute the software packages

Torrent-based software distribution in ALICE

4

Torrent terminology

GDB, Annecy 10.10.2012

package.tar.gz

Chunks of equal size

package.tar.gz.torrent

Clients

Metadata of the original file-SHA1 of chunks-SHA1 of entire file-Tracker location

Tracker

Initial seeder

Seeder

Leech

Leech

Advertise hashes

of complete

chunks

Exchange chunks

Prefer high-speed peers

Get

file

info

Torrent-based software distribution in ALICE

5

How it works

GDB, Annecy 10.10.2012

Build servers

Software repository( one tar.gz / version

)

AliEn file catalogue

torrent://alitorrent.cern.ch/…

Torrent trackeralitorrent.cern.ch:8088

Torrent seederalitorrent.cern.ch:8092

Site X

WN 1

WN 2

WN n

Site Y

WN 1

WN 2

WN n

No seeding between sites

Torrent-based software distribution in ALICE

6

How it works (2)

GDB, Annecy 10.10.2012

Build servers for SLC5 (32b, 64b), SLC6 (32b, 64b), Mac OS X, Ubuntus …

Software repository: 150GB in 600 archives Total size of a compressed (4x factor) software ‘set’

per job is ~300MB (this is what is downloaded to the WN)

One central tracker and seeder Limited to 50MB/s to the world

Fallback to other download methods if torrent download fails for any reason wget, xrdcp But seed them nevertheless

Torrent-based software distribution in ALICE

7

How it works (3)

GDB, Annecy 10.10.2012

Bootstrap Pilot job script fetches and installs on the local node

(`pwd`) the latest AliEn build by Torrent (20MB)AliEn JobAgent gets a real job from the central

queue and downloads the required software packages Continuing to seed them in background for other local

agents to quickly get them by LANThe JA will run more jobs of the same type (user

and SW requirements) within the TTL of the jobEverything is downloaded in the sandbox of the

job, so is wiped at the end of its execution

Torrent-based software distribution in ALICE

8

Torrent features we use

GDB, Annecy 10.10.2012

Clients explicitly publish their private IP in the central tracker Allowing the discovery of LAN peers via this common

service even behind NATLocal Peer Discovery

Multicast to discover peers on same networkPeer exchange

Peer lists are distributed between the local peersDistributed Hash Tables

Decentralized seeder lookup – seeders are trackers

Torrent-based software distribution in ALICE

9

Site requirements

GDB, Annecy 10.10.2012

How to allow this to happen iptables rules accepting:

Outgoing to alitorrent.cern.ch TCP/8088,8092 WN-to-WN on

TCP, UDP / 6881:6999 – aria2c default listening ports UDP, IGMP -> 224.0.0.0/4 – local peer discovery

Typically this is already the case, in some cases the ports had to be whitelisted (very smart firewalls )

Implicitly sites do not exchange any torrent traffic between them

No service to run on the site or on the machines, no shared area any more, no SPF, essentially no local support for this

Torrent-based software distribution in ALICE

10

History

GDB, Annecy 10.10.2012

The deployment has faced only policy difficulties Eventually accepted after understanding the technology There is no evil technology, only evil use…

First tests at CERN in 02.2009Site deployments starting 06.2009

As the shared areas were proving insufficient First at the large sites, in operation since 2 years

Presented in various forums within the collaboration and at CHEPs

Large awareness call in 01.2012 at ALICE T1/T2 Workshop in Karlsruhe

Torrent-based software distribution in ALICE

11

Migration status

GDB, Annecy 10.10.2012

First transitions done in close collaboration with the sites debugging on the WNs, following up the

consequences on the local network, firewalls and suchOne month ago we have asked all sites for

permission to enable torrent Most have confirmed that the policy allows the torrent

protocol and checked the firewall policies and now they run torrent

Working with the rest to solve the (mostly) non-technical issues

Some mails went to unread mailboxes …

Torrent-based software distribution in ALICE

12

Migration status

GDB, Annecy 10.10.2012

T0 – in operation since 3 yearsT1s – 5 / 6 migratedT2s – 36 / 78 migrated

Currently covering 2/3 of the resources, so on average more than 20K concurrent jobs are using torrent Rock solid, very efficient technology No incidents reported

Aiming for full migration until next AliEn version is deployed, to completely drop the PackMan VoBox service and the need for shared SW area and caches

Torrent-based software distribution in ALICE

13

Conclusion

GDB, Annecy 10.10.2012

Torrents have enabled us to Simplify site operations by removing a VoBox service and the shared

SW areas Significantly reduce problems associated with SW deployment,

relieves the sites support staff Have quick software release cycles (both experiment and Grid

middleware)The migration process was carefully staged

Policy limitation clarified – discussion with security experts Discussions and deployment at T0/T1s and selected T2s (regional

coverage) Presently – towards complete site coverage

Lifts some of the requirement for a site VoBox, specific configurations and services Forward-looking system - towards opportunistic use of resources

and clouds!