costin.grigoras@cern.ch torrent-based software distribution in alice
Post on 23-Dec-2015
216 Views
Preview:
TRANSCRIPT
Torrent-based software distribution in ALICE
2
Outline
GDB, Annecy 10.10.2012
MotivationHow it worksSite requirementsHistoryMigration status
Torrent-based software distribution in ALICE
3
Motivation
GDB, Annecy 10.10.2012
ALICE was using site shared areas for installing the pre-compiled experiment software packages
Large sites suffered from AFS/NFS/… scalability issues and being a single point of failure
Large space needed for the many active versionsOld model needed a site local service to manage the
installation, unpacking and deletion of the packagesRequirement for strict site configuration to support
operation – excludes use of ‘opportunistic’ resources/centres
From the very beginning, the shared SW area and its access from the VO-box was considered a security risk
All of the above and more are solved by the use of the Torrent protocol to distribute the software packages
Torrent-based software distribution in ALICE
4
Torrent terminology
GDB, Annecy 10.10.2012
package.tar.gz
Chunks of equal size
package.tar.gz.torrent
Clients
Metadata of the original file-SHA1 of chunks-SHA1 of entire file-Tracker location
Tracker
Initial seeder
Seeder
Leech
Leech
Advertise hashes
of complete
chunks
Exchange chunks
Prefer high-speed peers
Get
file
info
Torrent-based software distribution in ALICE
5
How it works
GDB, Annecy 10.10.2012
Build servers
Software repository( one tar.gz / version
)
AliEn file catalogue
torrent://alitorrent.cern.ch/…
Torrent trackeralitorrent.cern.ch:8088
Torrent seederalitorrent.cern.ch:8092
Site X
WN 1
WN 2
WN n
Site Y
WN 1
WN 2
WN n
No seeding between sites
Torrent-based software distribution in ALICE
6
How it works (2)
GDB, Annecy 10.10.2012
Build servers for SLC5 (32b, 64b), SLC6 (32b, 64b), Mac OS X, Ubuntus …
Software repository: 150GB in 600 archives Total size of a compressed (4x factor) software ‘set’
per job is ~300MB (this is what is downloaded to the WN)
One central tracker and seeder Limited to 50MB/s to the world
Fallback to other download methods if torrent download fails for any reason wget, xrdcp But seed them nevertheless
Torrent-based software distribution in ALICE
7
How it works (3)
GDB, Annecy 10.10.2012
Bootstrap Pilot job script fetches and installs on the local node
(`pwd`) the latest AliEn build by Torrent (20MB)AliEn JobAgent gets a real job from the central
queue and downloads the required software packages Continuing to seed them in background for other local
agents to quickly get them by LANThe JA will run more jobs of the same type (user
and SW requirements) within the TTL of the jobEverything is downloaded in the sandbox of the
job, so is wiped at the end of its execution
Torrent-based software distribution in ALICE
8
Torrent features we use
GDB, Annecy 10.10.2012
Clients explicitly publish their private IP in the central tracker Allowing the discovery of LAN peers via this common
service even behind NATLocal Peer Discovery
Multicast to discover peers on same networkPeer exchange
Peer lists are distributed between the local peersDistributed Hash Tables
Decentralized seeder lookup – seeders are trackers
Torrent-based software distribution in ALICE
9
Site requirements
GDB, Annecy 10.10.2012
How to allow this to happen iptables rules accepting:
Outgoing to alitorrent.cern.ch TCP/8088,8092 WN-to-WN on
TCP, UDP / 6881:6999 – aria2c default listening ports UDP, IGMP -> 224.0.0.0/4 – local peer discovery
Typically this is already the case, in some cases the ports had to be whitelisted (very smart firewalls )
Implicitly sites do not exchange any torrent traffic between them
No service to run on the site or on the machines, no shared area any more, no SPF, essentially no local support for this
Torrent-based software distribution in ALICE
10
History
GDB, Annecy 10.10.2012
The deployment has faced only policy difficulties Eventually accepted after understanding the technology There is no evil technology, only evil use…
First tests at CERN in 02.2009Site deployments starting 06.2009
As the shared areas were proving insufficient First at the large sites, in operation since 2 years
Presented in various forums within the collaboration and at CHEPs
Large awareness call in 01.2012 at ALICE T1/T2 Workshop in Karlsruhe
Torrent-based software distribution in ALICE
11
Migration status
GDB, Annecy 10.10.2012
First transitions done in close collaboration with the sites debugging on the WNs, following up the
consequences on the local network, firewalls and suchOne month ago we have asked all sites for
permission to enable torrent Most have confirmed that the policy allows the torrent
protocol and checked the firewall policies and now they run torrent
Working with the rest to solve the (mostly) non-technical issues
Some mails went to unread mailboxes …
Torrent-based software distribution in ALICE
12
Migration status
GDB, Annecy 10.10.2012
T0 – in operation since 3 yearsT1s – 5 / 6 migratedT2s – 36 / 78 migrated
Currently covering 2/3 of the resources, so on average more than 20K concurrent jobs are using torrent Rock solid, very efficient technology No incidents reported
Aiming for full migration until next AliEn version is deployed, to completely drop the PackMan VoBox service and the need for shared SW area and caches
Torrent-based software distribution in ALICE
13
Conclusion
GDB, Annecy 10.10.2012
Torrents have enabled us to Simplify site operations by removing a VoBox service and the shared
SW areas Significantly reduce problems associated with SW deployment,
relieves the sites support staff Have quick software release cycles (both experiment and Grid
middleware)The migration process was carefully staged
Policy limitation clarified – discussion with security experts Discussions and deployment at T0/T1s and selected T2s (regional
coverage) Presently – towards complete site coverage
Lifts some of the requirement for a site VoBox, specific configurations and services Forward-looking system - towards opportunistic use of resources
and clouds!
top related