5-Jul-05 Deployment Board 2
Introduction
• Not a formal DB meeting– DB last met on 1st/2nd June 2005 (Glasgow)– Will next meet on 14th/15th Sep 2005
(Dublin)
• BUT.. Chance to discuss some of the current issues
• Many of the topics also covered elsewhere on GridPP13 agenda– Try not to duplicate discussions there
5-Jul-05 Deployment Board 3
Agenda
• Introductions• Technical Documentation• Security Policy and Procedures• LCG and gLite releases and deployment• Deployment Metrics – Get fit actions• Storage issues• Tier 2 deployment and operations• Other issues
5-Jul-05 Deployment Board 4
Documentation
• Strong requirement for good documentation– User guides– Sys Admin guides– Web pages– Etc
• Progress to date not bad (but slow)• We need someone to drive this• Oversight Committee agrees• Is anyone interested/able to do this?• Or should we recruit?
5-Jul-05 Deployment Board 5
Security Policy and Procedures
• User and VO AUP• Incident Response• Lessons from recent ssh key incident• Security Vulnerability policy
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Update on LCG/EGEE Security Policy and ProceduresDavid Kelsey, CCLRC/RAL, [email protected]
LCG GDB Meeting,CERN, 22 June 2005
Deployment Board 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Overview
Work of the Joint (LCG/EGEE) Security Policy Group– In collaboration with US Open Science Grid (OSG)– JSPG meeting – CERN – 13/14 June 2005
• User Acceptable Use Policy– Not yet in EDMS
• VO Security Policy (and AUP)– https://edms.cern.ch/document/ 573348/
• Incident Handling and Response Guide– https://edms.cern.ch/document/428035/
Deployment Board 8
Enabling Grids for E-sciencE
INFSO-RI-508833
User AUP (Version: The Taipei Accord 29 April 2005)
USER AGREEMENT (accepted during registration with a VO)
1) You may only perform work, or transmit or store data consistent with the activities and policies of the Virtual Organizations of which you are a member, and only on resources authorized for use by those Virtual Organizations.
2) You will not attempt to circumvent administrative or security controls on the use of resources. If you are informed that some aspect of your grid usage is creating a problem, you will adjust your usage and investigate ways to resolve the complaint.
Deployment Board 9
Enabling Grids for E-sciencE
INFSO-RI-508833
User AUP (2)
3) You will immediately report any suspected compromise of your grid credentials or suspected misuse of grid resources to incident reporting locations specified by the Virtual Organization(s) affected and credential issuing authorities as specified in their agreements and policy statements.
4) You are aware that resource providers have the right to regulate access as they deem necessary for either operational or security-related reasons and that your use of the Grid is also bound by the rules and policies of the organizations through which you obtain access, e. g. your home institute, your national network and/or your internet service provider(s).
Deployment Board 10
Enabling Grids for E-sciencE
INFSO-RI-508833
User AUP – Discussion since 18 May
• Sent to GDB and ROC Managers for comment• Approved by OSG Council on 31 May 2005• Comments received (mainly on bullet 4)
– What about resource provider policy?– What about Grid/Infrastructure policy?– Do we have the right to cut-off users?
Legal status of “Service” providers?
– What about Data Protection laws?– Style: I am/will versus You are/will
• Discussed issues at JSPG meeting 14 June– Decided to consult some legal experts– Feedback received from one site and one network– Expecting another site feedback soon
Deployment Board 11
Enabling Grids for E-sciencE
INFSO-RI-508833
AUP feedback to date
• Site legal advice– Current draft text not sufficient– Rules not binding unless users aware of them
Must be pointers to all rules Doesn’t matter if too long to read
• As long as they have the opportunity– Bullet 4 does not give us the right to control access– Data protection needs to be addressed if personal info shared– Must state that users register every 12 months
• NREN response– Looks good approach– Similar to work on location independent networking project– Perhaps move towards single AUP for common “visiting user” policy?– Bound by home site and home network rules
Must respect others and cease activity when requested– Need to make clear what is allowed and what not
Then can control access Access can be limited to one application (tested in law)
Deployment Board 12
Enabling Grids for E-sciencE
INFSO-RI-508833
AUP conclusion
• Not yet ready for GDB approval– BUT further comments very welcome
• Awaiting feedback from another site lawyer• JSPG needs to discuss the way forward
– Remembering that OSG has already approved
• Will come back to GDB and ROC managers asap
Deployment Board 13
Enabling Grids for E-sciencE
INFSO-RI-508833
VO Security Policy
• Draft document – presented at last GDB– Author: Ian Neilson
• https://edms.cern.ch/document/ 573348/• No comments received
– Except for internal JSPG discussion
• Made clear that security contact point must be a single e-mail address
• Recent discussion (not concluded) on VO AUP text– Binding users to Grid/Infrastructure Policy or not?– What do users need to read, be able to read, be aware of?
• Depends on final decision on User AUP• So again, not ready for approval yet• BUT… comments welcome!
Deployment Board 14
Enabling Grids for E-sciencE
INFSO-RI-508833
New Incident Response
• Based on work by Open Science Grid• We use the OSG document “as is”
– But with a covering document explaining differences
• https://edms.cern.ch/document/428035• Presented to GDB for first time today
– Then period of discussion– Ask for feedback from ROC Managers and OSCT
• Aim for approval at next GDB
Deployment Board 15
Enabling Grids for E-sciencE
INFSO-RI-508833
OSG Document
• Describes policy and procedures• Sites MUST report security incidents
– In addition to normal reporting to CERT, CSIRT
• Handling of sensitive data– Public disclosure via PR offices– National/International coordination done by law enforcement
• Security Contacts must be registered for each site– Maintained by GOC
• Mail list is also group of experts to provide advice• Mail lists: Report and Discuss• Defines the Incident Reporting process
– Discovery, Analysis, Classification, Containment, Notification, Escalation, Response, Post-incident analysis
• Volunteer response team created if needed
Deployment Board 16
Enabling Grids for E-sciencE
INFSO-RI-508833
LCG/EGEE covering document
• Intended audience– Site Security Contacts and System Administrators
• Defines mail lists for LCG/EGEE• Warns that Incident Response info may be shared with
other Grids (where agreements exist)• Team leader to coordinate response
– Initially organised by site reporting and its local ROC security contact
– ROC contact responsible to make sure that process happens
Deployment Board 17
Enabling Grids for E-sciencE
INFSO-RI-508833
Sharing of Incident Info
• In many cases it will be important to share Incident information between Grids
• May happen informally via sites which belong to more than one Grid
• Formal agreements will be needed– Where Grids follow the same/similar policy and procedures– But only where reciprocal agreement
• JSPG keen to arrange reciprocal agreement with OSG• Also need to consider national Grids
– ROC responsibility so job here for OSCT?
Deployment Board 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Summary
• More work needed on AUP and VO Security Policy– Will come back to GDB when ready
• Inviting discussion on Incident Response document– Approval at next GDB?
Deployment Board 19
Enabling Grids for E-sciencE
INFSO-RI-508833
Useful Links
• Meetings - Agenda, presentations, minutes etc
http://agenda.cern.ch/displayLevel.php?fid=68• JSPG Web site
http://proj-lcg-security.web.cern.ch/• Policy documents at
http://cern.ch/proj-lcg-security/documents.html
5-Jul-05 Deployment Board 20
LCG, gLite releases
• Deployment planning• Ian Bird’s talk on Monday
– More frequent releases
• Conflicts with SC3, SC4 etc• UK feedback to CERN deployment?
5-Jul-05 Deployment Board 22
Lessons from recent ssh incident
• Dteam user trying to run MPI jobs copied ssh keys
• Lots of discussion on LCG-Rollout and TB-Support– How do I blacklist a user certificate?– How do we get the user’s certificate revoked?– Is this an incident?– Why doesn’t LCG security team urgently investigate?– Will we see a full report on lessons learned?– Need good advice for sys admins to avoid incidents?– There is no DTEAM AUP – why running MPI jobs?– How does sys admin contact the user?– There is no infrastructure for dealing with real
incidents?
5-Jul-05 Deployment Board 23
Incident response lessons
• How do I blacklist a user certificate?– Use Local Authorization– LCAS configuration– Or remove from Grid mapfile
• But update will reinsert until removed by VO
• How do we get the user’s certificate revoked?– Contact the CA or RA– But in general they will not revoke unless allowed by
their policy (e.g. proof of compromised private key)– Instead contact VO (via ROC) to remove Authorization
5-Jul-05 Deployment Board 24
Lessons (2)
• Is this an incident?• Current Incident Response Policy definition
– Any security investigation that causes a site to interrupt service
• ie. disconnect a machine or bar a user
– Any instance of suspected misuse of grid resources beyond the local site
– There is a reasonable possibility that credentials have been stolen and those credentials will not expire or be revoked within 3 days of the possible theft.
5-Jul-05 Deployment Board 25
Lessons (3)
• New Incident Response Policy definition– An incident is any real or suspected event
that poses a real or potential threat the integrity of services, resources, infrastructure, or identities. • Grid participants MUST report incidents that
have known or potential impact or relationship to Grid resources, services, or identities.
• Grid participants MUST respond to incidents involving locally managed or operated resources, services, or identities.
5-Jul-05 Deployment Board 26
Lessons (4)
Incident Classification (new policy)
High: (team leader required)• The incident could lead to exploitation of the trust fabric, i.e
user and host identities, or the incident could lead to instability of the overall Grid, or a denial-of-service is in progress against all replicas of a given Grid service.
Medium: (team leader required if widespread)• The incident affects an instance of a Grid service, but Grid
stability is not at risk, or a denial-of-service affects one replica of a given Grid service, or a local attack compromised a privileged user account.
Low: (team leader probably not required)• A local attack comprised individual user, non-privileged
credentials, or a denial-of-service attack or compromise affects only local grid resources.
5-Jul-05 Deployment Board 27
Lessons (5)
• Why doesn’t LCG security team urgently investigate?– There is no such LCG team– Security is responsibility of the ROC’s
• Coordinated by OSCT
• Will we see a full report on lessons learned?– Ian Neilson sent to the list 3 days later– He concluded should have been discussed on “security
contacts”
• Need good advice for sys admins to avoid incidents?– Agreed– Romain W is leading activity (RSS feeds)– Volunteers needed!
5-Jul-05 Deployment Board 28
Lessons (6)
• There is no DTEAM AUP – why running MPI jobs?– Agreed– This is a requirement of the new policy
• How does sys admin contact the user?– Via ROC to VO– E-mail notification of all user registrations– Requirement for read-only access to VO
database• Must respect privacy
5-Jul-05 Deployment Board 29
Lessons (7)
• There is no infrastructure for dealing with real incidents?– May not be perfect – but policy requires reporting to “csirt” list (as well as local
reporting)– New policy requires creation of team to deal with incident
• Responsibility of ROC (OSCT)– Not appropriate to discuss on LCG-rollout or TB-Support
• OSCT needs to decide how best to communicate with Sys Admins– Current approach is via “Security Contacts” list– My personal view: we need an emergency sysadmin mail
list
5-Jul-05 Deployment Board 30
Security Vulnerability
• Linda Cornwall will present tomorrow• There has been lots of discussion re policy
– Discuss these now
• Everyone agrees aim– To protect our sites, resources and data– Improve quality of middleware and deployment
• Concerns about– Legal liability– Do we “publish” vulnerabilities (if so, when?)– Developers will not fix unless we publish– How do we keep information private before
publishing?
5-Jul-05 Deployment Board 31
Current model
• An internal LCG/EGEE/GridPP activity– No responsibility to outside customers
• Legal responsibility (if any) is SA1 and JRA1• Risk assessment done quickly• Inform OSCT and developers quickly
– OSCT/Deployment team can inform all sites if necessary• Or should the group inform all sites quickly (on closed list)?
• Allow time for fix (45 days?)• When problem fixed or on timeout
– Inform all sites but never fully publish• JRA1 and/or SA1 can publish if they wish
• Status reports (stats) available on web (with access control)– Report regularly to Management (JRA1 and SA1)
• Mirror entries in JRA1 Savannah• Any site admin can join group
– But must abide by policy
5-Jul-05 Deployment Board 32
Alternative model
• Proposed by Romain W (see slides)– Support from some sys admins
• More like an external activity– Similar to CERT/CC
• After quick risk analysis– Inform developers and/or deployment
team– Do not inform sites (info will leak out)
• After timeout– Full publication (responsible disclosure)
5-Jul-05 Deployment Board 33
Decision?
• LCG, EGEE, GridPP management will decide the approach– PEB, PMB etc
• General EGEE agreement – Athens and Brno meetings– Internal activity so cannot fully publish– Responsibility for informing internal and external customers rests
on SA1 (OSCT/ROCs) and JRA1
• Feedback welcome to inform this– When/how do sites need to be informed?
• If this approach does not produce results we should retain right to change to
• Important to get this right• BUT even more important to get on with fixing problems
– Volunteers needed for Risk Analysis
5-Jul-05 Deployment Board 34
Storage issues
• dCache and DPM– (Short) discussion later on agenda
• Open Source policy– dCache unlikely to be Open Source– Lots of discussion on TB-Support– GridPP policy is to write Open Source s/w– Arguments why we should only use
OpenSource– UK evaluation/review of dCache?