www.eu-etics.org
ETICS All Hands meeting ETICS All Hands meeting Bologna, October 23-25, 2006Bologna, October 23-25, 2006
NMI and Condor:Status + Future Plans
Andy PAVLO
Peter COUVARESBecky GIETZEL
Bologna -- All Hands Meeting 2
Overview
• Introduction• Cross-site Job Migration• Improving Documentation• Virtual Machines• Generic Connection Broker• Future Plans• Q & A
Bologna -- All Hands Meeting 3
Introduction
• University of Wisconsin team is dedicated to improving Condor technologies and the NMI framework.
• Condor user base continues to grow.• Expecting upcoming surge of NSF users for NMI.
Bologna -- All Hands Meeting 4
Cross-site Job Migration
• Pools of ETICS computing resources installed at INFN, CERN, and University of Wisconsin.
• Jobs automatically routed to remote sites when local resources are unavailable to satisfy requirements.
• Transparent to users.
Bologna -- All Hands Meeting 5
Cross-site Job Migration
CondorSchedd-on-the-Side
CondorSchedd-on-the-Side
CondorJobCondor
JobCondor-CJob
Grid ResourceRouting Table
NMIBuild/Test
Submission
Local Site
Remote Site
CondorSchedd
CondorSchedd
ResourceAdvertiser
ResourceAdvertiserCondor
Matchmaker
CondorMatchmaker
CondorMatchmaker
CondorMatchmaker
Bologna -- All Hands Meeting 6
Cross-site Job Migration
NMI UniverseBeyond ETICS:
OMII-UK, OMII-Europe
Available Resources
ResourceAdvertiser
CERN
ResourceAdvertiser
INFN
ResourceAdvertiser
University of Wisconsin
Bologna -- All Hands Meeting 7
Cross-site Job Migration
• Current status:– Explicit job routing is available in NMI framework 2.1.7
• Future plans:– Initial deployment (without prereq information): November 2006– Improved matchmaking: December 2006
• Still to be determined:– Authorization/Authentication method(s)– Scalable distributed data dissemination
Bologna -- All Hands Meeting 8
Documentation
• Emphasis on creating complete documentation and user tutorials for NMI framework.
• Additional contributions from Michael Bletzinger (NCSA)• Target deadline: December 2006 ~ January 2007• New website: http://nmi.cs.wisc.edu
Bologna -- All Hands Meeting 9
Virtual Machines
• Jobs are sand boxed inside of a virtual machine– Changes to the system are isolated to the local VM.
• Allow for more robust build and test scenarios• Current Status in Condor:
– Preliminary support for VMware is in Condor 6.9– Users must create the VM image beforehand.– Future plans is to create VM dynamically and insert jobs– Plan to support Xen and VirtualPC Virtual Machines
• Condor's current VM-support is not directly usable by the NMI framework.
Bologna -- All Hands Meeting 10
Virtual Machines: Future Plans
• NMI and ETICS could provide a standard image per OS, configured with pre-requisite software.
• Images are stored in a cache and dynamically deployed with builds and tests.
• Users only need add a single-line to their submission file
• NMI framework enhancements:– Maintain cache of available OS VM images.– Inject build and test scripts inside of VM image.– Extract appropriate status, logs, and job artifacts.
Bologna -- All Hands Meeting 11
Generic Connection Broker
• One way for Condor jobs to traverse firewall.• Daemon that acts as a proxy at the edge of firewalls.• Acts as a broker, then steps out of the way.• Low “maintenance”:
– Works with NATs and multipleprivate networks.
– No changes to firewallconfiguration
Matchmaker
Executor
Submitter
GCB4 1
2
3
5
1) Executor registers with GCB2) Executor advertises to matchmaker3) After match, submitter contacts executor, via GCB4) GCB tells executor to open connection5) Executor opens connection to submitter
Bologna -- All Hands Meeting 12
Gateway Connection Broker
• Currently only supported in Condor 6.8 for Linux• Wisconsin team is working to improve GCB:
– Clean up code base and remove testing logic– Port to other operating systems– Improve scalability and network performance
Bologna -- All Hands Meeting 13
Other Future Plans: NMI
• Parallel scheduling enhancements:– Task synchronization– Primitives today, high-level dependency spec/mgmt tomorrow?– Scalability testing: 10^1, 10^2, 10^3, 10^4 nodes?
• Re-factored database schema:– Improved DB scalability and performance– Improved build/test artifact provenance– Project hierarchy– Users and groups– Builds and tests are coupled to projects– Task-level metrics
• Fuzz testing mechanisms• Website enhancements (maybe):
– Consolidate "old" and "new" web interface– May focus more on debugging info than status info
Bologna -- All Hands Meeting 14
Other Future Plans: Condor
• New Development Series: Condor 6.9• Improved scalability:
– Modularize schedd tasks– Non-blocking I/O
• Privilege separation:– Daemons no longer need to start with setuid permissions– Integration with glexec/sudo
• Enhanced security– Continue with source code audits– Signed ClassAds
• Parallel scheduling:– Document & understand current issues in a pool doing both
independent & parallel work– Improve incrementally based on production experiences