vms at a tier-1 site
DESCRIPTION
VMs at a Tier-1 site. EGEE’09, 21-09-2009 Sander Klous, Nikhef. Contents. Introduction Who are we? Motivation Why are we interested in VMs? What are we going to do with VMs? Status How do we approach this issue? Where do we stand? Challenges. Introduction. Collaboration between - PowerPoint PPT PresentationTRANSCRIPT
VMs at a Tier-1 site
EGEE’09, 21-09-2009
Sander Klous, Nikhef
Contents• Introduction
– Who are we?• Motivation
– Why are we interested in VMs?– What are we going to do with VMs?
• Status– How do we approach this issue?– Where do we stand?
• Challenges
03-09-2009 BIG Grid - Virtualization working group 2
Introduction
• Collaboration between– NCF: national computing facilities– Nikhef: national institute for subatomic physics– NBIC: national bioinformatics center
• Participation from Philips, SARA, etc.
Goal:
“Enables access to grid infrastructures for scientific research in the Netherlands”
03-09-2009 BIG Grid - Virtualization working group 3
Motivation: Why Virtual Machines?• Site perspective
– Resource flexibility (e.g. SL4 / SL5)– Resource management
• Scheduling / multi-core / sandboxing
• User perspective– Isolation from environment
• Identical environment on multiple sites• Identical environment on local machine
03-09-2009 BIG Grid - Virtualization working group 4
Different VM classes• Class 1: Site generated Virtual Machines
– No additional trust issues– Benefits for system administration
• Class 2: Certified Virtual Machines– Inspection and certification to establish trust– Requirements for monitoring / integration
• Class 3: User generated Virtual Machines– No trust relation– Requires appropriate security measures
03-09-2009 BIG Grid - Virtualization working group 5
Resource management
Site infrastructure
Typical use case Class 1 VM
Torque/PBS
Box 2“8 Virtual SL4 WNs”
Box 3“8 Virtual SL5 WNs”
Virtual Machine Manager
Job queue
VMqueue
Box 1“Normal WN”
03-09-2009 BIG Grid - Virtualization working group 6
Typical use case Class 2 VM
Analysis on Virtual Machines• Run minimal analysis on desktop/laptop
– Access to grid services• Run full analysis on the grid
– Identical environment– Identical access to grid services
• No interest to become system administrator– Standard experiment software is sufficient
03-09-2009 BIG Grid - Virtualization working group 7
Typical use case Class 3 VM
Identification and classification of GPCRs• Requires very specific software set
– Blast 2.2.16– HMMER 2.3.2– BioPython1.50
• Even non-x86 (binary) applications!• Specific software for this user• No common experiment software
03-09-2009 BIG Grid - Virtualization working group 8
Project status• Working group: virtualization of worker nodes
https://wiki.nbic.nl/index.php/BigGrid_virtualisatie• Kick-off meeting July 6th 2009
– System administrators, User support, management• Phase 1 (3 months)
– Collect site and user requirements– Identify other ongoing efforts in Europe– First design
• Phase 2 (3 months)– Design and implement proof of concept
03-09-2009 BIG Grid - Virtualization working group 9
Active working group topics
• Policies/Security issues for Class 2/3 VMs• Technology study
– Managing Virtual Machines– Distributing VM images– Interfacing the VM infrastructure with ‘the grid’
• Identify missing functionality and alternatives– Accounting and fare share, image management,
authentication/authorization, etc.
03-09-2009 BIG Grid - Virtualization working group 10
The Amazon identity crisis
• The three most confronting questions:1. What is the difference between a job and a VM?
2. Why can I do it at Amazon, but not at the grid?
3. What is the added value of grids over clouds?
“We don’t want to compete with Amazon!”
03-09-2009 BIG Grid - Virtualization working group 11
Policy and security issues
E-science services and functionality• Data integrity, confidentiality and privacy• Non-repudiation of user actions
System administrator point of view• Trust user intentions, not their implementations• Incident response more costly than certification• Forensics is time consuming
03-09-2009 BIG Grid - Virtualization working group 12
Compromised user space is often already enough trouble
Security 101 = Attack surface
03-09-2009 BIG Grid - Virtualization working group 13
Available policies
• Grid Security Policy, version 5.7a• VO Portal Policy, version 1.0 (draft)• Big Grid Security Policy, version 2009-025
– Grid Acceptable Use Policy, version 3.1– Grid Site Operations Policy, version 1.4a– LCG/EGEE Incident Handling and Response Guide,
version 2.1– Grid Security Traceability and Logging Policy,
version 2.0• VO-Box Security Recommendations and
Questionnaire, version 0.6 (draft, not ratified)
03-09-2009 BIG Grid - Virtualization working group 14
Relevant policy statements
• Network security is covered by site local security policies and practices
• A VO Box is part of the trusted network fabric. Privileged access is limited to resource administrators
• Software deployed in the grid must include sufficient and relevant site central logging.
03-09-2009 BIG Grid - Virtualization working group 15
First compromise• Certified package repository
– Base templates– Certified packages
• Separate user disk– User specific stuff– Permanent storage
• At run time– No privileged access– Comparable to VO box
03-09-2009 BIG Grid - Virtualization working group 16
Licenses?
Second compromise• Make separate grid DMZ for Class 3 VMs• Comparable to “Guest networks”
– Only outbound connectivity• Detection of compromised guests
– Extended security monitoring• Packet inspection, netflows (SNORT, nfsen)• Honeypots, etc.
• Simple policy: one warning, you’re out.• Needs approval (network policy) from
OST (Operations Steering Team)03-09-2009 BIG Grid - Virtualization working group 17
TECHNOLOGY STUDY
03-09-2009 BIG Grid - Virtualization working group
18
Resource management
Site
Managing VMs
Torque/PBS
Box 2“8 Virtual WNs”
Box 3“8 Class 2/3 VMs”
OpenNebula
Job queue
VMqueue
Box 1“Normal WN”
Haizea
03-09-2009 BIG Grid - Virtualization working group 19
Class 2/3
upload solution
iSCSI/LVM
Distributing VM images
Box 3“8 Class 2/3 VMs”
Box 1“Normal WN”
Box 2“8 Virtual WNs”
Repository (SAN)
ImageImageImageImageImage
03-09-2009 BIG Grid - Virtualization working group 20
Cached copy-on-write
03-09-2009 BIG Grid - Virtualization working group 21
Box 1
Repository
Cache
ImageCOW
COWVM
VM
Box 2
Cache
Image
COW
COW
VM
VM
Image
Interfacing VMs with ‘the grid’
Resource management
Torque/PBS OpenNebula
Class 2/3
upload solution
Repository (SAN)
ImageImageImageImageImage
Class 2 Class 3
discussion
Grid middleware• globus-job-run• globus-gatekeeper• globus-job-
manager• contact-string
• jm-pbs-long• jm-opennebula
• qsub / opennebula
Nim
bus/OC
CI
03-09-2009 BIG Grid - Virtualization working group 22
VM contact-string• User management mapping
– Mapping to OpenNebula users• Authentication / Authorization
– Access to different VM images• Grid middleware components involved:
– Cream-CE, BLAHp, glexec– Execution Environment Service
https://edms.cern.ch/document/1018216/1
– Authorization Service Designhttps://edms.cern.ch/document/944192/1
03-09-2009 BIG Grid - Virtualization working group 23
Coffee table discussion
Parameter passing issue
Monitoring/Performance testing
Show Hosts : yes no | Experimental network_report last hour sorted descending | Columns 4 Size medium
(Nodes colored by 1-minute load) | Legend
Ganglia Web Frontend version 3.1.1 Check for Updates.
tropeRretsulClatnemirepxE::ailgnaG9002-9-3
http://ploeg.nikhef.nl/ganglia/?m=network_report&r=ho… 2/3
03-09-2009 BIG Grid - Virtualization working group 24
Performance• Small cluster
– 4 dual CPU quad core machines– Image server with 2 TB storage
• Integration with experimental testbed– Existing Cream-CE / Torque
• Testing– Network I/O, is NAT feasible?– File I/O, what is the COW overhead?– Realistic jobs
03-09-2009 BIG Grid - Virtualization working group 25
Other challenges• Accounting, scheduling based on Fair Share • Scalability!• Rapidly changing landscape
– New projects every week– New versions every month
• So many alternatives– VMWare, SGE, Eucalyptus, Enomaly– iSCSI, NFS, GFS, Hadoop– Monitoring and security tools
03-09-2009 BIG Grid - Virtualization working group 26
Conclusions
• Maintainability: no home grown scripting– Each solution should be part of a product– Validation procedure with each upgrade
• Deployment– Gradually move VM functionality in production
1. Introduce VM worker nodes
2. Virtual machine endpoint in grid middleware
3. Test with a few specific Class 2/3 VMs
4. Scaling and performance tuning
03-09-2009 BIG Grid - Virtualization working group 27