[ieee third international symposium on information assurance and security - manchester, uk...

6

Click here to load reader

Upload: giordano

Post on 19-Mar-2017

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE Third International Symposium on Information Assurance and Security - Manchester, UK (2007.08.29-2007.08.31)] Third International Symposium on Information Assurance and Security

A Secure Storage Service for the gLite Middleware

Diego Scardaci, Giordano Scuderi INFN Catania, UNICO S.R.L.

[email protected], [email protected]

Abstract

The Secure Storage service for the gLite

middleware provides users with a set of tools to store in a secure way and in an encrypted format confidential data (e.g. medical or financial data) on the grid storage elements. The data stored though provided tools will be accessible and readable by authorized users only. Moreover, it solves the insider abuse problem preventing also the administrators of the storage elements to access the confidential data in a clear format. The service has been designed and developed for the grid middleware of the EGEE Project, gLite, in the context of the TriGrid VL Project. 1. Introduction

The term “information security” [7] means protecting information and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction in order to provide integrity, confidentiality and availability. The information integrity can be defined as the protection against improper modification or destruction, and includes authenticity. Confidentiality preserves authorized restrictions on information access and disclosure. Availability ensures timely and reliable access to and use of information.

The Secure Storage service allows users to manage confidential information/data in a Grid Computing environment guaranteeing the above described properties: data integrity, confidentiality, and availability. This service has been designed and developed for the EGEE infrastructure grid middleware [18], gLite [19]. 2. Grid Computing

Grid computing [1] [2] [3] [5] is an emerging computing model that distributes processing across a parallel infrastructure. Throughput is increased by networking many heterogeneous resources across

administrative boundaries to model a virtual computer architecture. For a computing problem to benefit from a grid, it must require either large amounts of computation time or large amounts of data, and it must be reducible to parallel processes that do not require intensive inter-communication.

Grid computing offers a model for solving massive computational problems by making use of distributed resources (CPU cycles and/or disk storage) of large numbers of different organizations.

Grids offer a way to solve Grand Challenge problems like protein folding, financial modeling, earthquake simulation, and climate/weather modeling. Grids offer also a way of using the information technology resources optimally inside an organization. They provide a means for offering information technology as a utility bureau for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water.

2.1. EGEE Infrastructure

The Enabling Grids for E-sciencE project [18] brings together scientists and engineers from more than 90 institutions in 32 countries world-wide to provide a seamless Grid infrastructure for e-Science that is available to scientists 24 hours-a-day.

Expanding from originally two scientific fields, high energy physics and life sciences, EGEE now integrates applications from many other scientific fields, ranging from geology to computational chemistry. Generally, the EGEE Grid infrastructure is ideal for any scientific research especially where the time and resources needed for running the applications are considered impractical when traditional IT infrastructures are used.

The EGEE Grid consists of over 30,000 CPU available to users 24 hours per day, 7 days per week, in addition to 5 Petabytes (5 million Gigabytes) about of storage, and maintains 50,000 concurrent jobs on average. Having such resources available completely changes the way scientific research is carried out. The end use depends on the users' needs: large storage

Third International Symposium on Information Assurance and Security

0-7695-2876-7/07 $25.00 © 2007 IEEEDOI 10.1109/IAS.2007.33

261

Page 2: [IEEE Third International Symposium on Information Assurance and Security - Manchester, UK (2007.08.29-2007.08.31)] Third International Symposium on Information Assurance and Security

capacity, the bandwidth that the infrastructure provides, or the sheer computing power available.

2.2. gLite middleware

A Grid middleware organizes and integrates the disparate computational facilities on a grid to present it as a homogeneous resource to the user.

gLite [19] is the next generation middleware for grid computing. Born from the collaborative efforts of more than 80 people in 12 different academic and industrial research centres, as part of the EGEE Project, gLite provides a bleeding-edge, best-of-breed framework for building grid applications tapping into the power of distributed computing and storage resources across the Internet.

Services in gLite comprise security, monitoring, job and data management and were developed to follow a service-oriented architecture: • Security services [20] encompass the

Authentication, Authorization, and Auditing services which enable the identification of entities (users, systems, and services), to allow or to deny access to services and resources. These services are based on the Grid Security Infrastructure (GSI) [17] described below. Every gLite user/host/service is identified by a X.509 certificate [12]. These certificates are signed by trusted Certification Authorities. All the gLite services’ transactions are mutually authenticated and encrypted as required by this infrastructure.

• Information and Monitoring Services provide a mechanism to publish and consume information and to use it for monitoring purposes. The information and monitoring system can be used directly to publish, for example, information concerning the resources on the Grid.

• Job Management Services are related to job management/monitoring/execution. The Computing Element is the service representing a computing resource and its main functionality is job management (job submission, job control, etc.). The Workload Management System (WMS) comprises a set of Grid middleware components responsible for the distribution and management of tasks across Grid resources, in such a way that applications are conveniently, efficiently, and effectively executed. The Logging and Bookkeeping service (LB) tracks jobs in terms of event gathered from various WMS components as well as CEs.

• Data Management System is the subsystem of the GRID infrastructure which takes care about file manipulation for both: all other GRID services and

user applications. Main capabilities provided by the DMS are locating, accessing and moving files. The data are kept in and/or replicated in the Grid Storage Elements. These services are detailed later on in this section.

• The access point to the Grid is the User Interface (UI). From a UI, a user can be authenticated and authorized to use the gLite resources, and can access the functionalities offered by the Information, Workload and Data management systems.

2.2.1. The Grid Security Infrastructure. The Grid

Security Infrastructure [17], formerly called the Globus Security Infrastructure, is a specification for secret, tamper-proof, delegated communication between software in a grid computing environment. Secure and authenticated communication is enabled using asymmetric encryption (X.509 Public Key Infrastructure [12]).

The GSI provides a delegation capability. It is an extension of the standard SSL protocol [13], which reduces the number of times the user must enter his pass-phrase. If a Grid computation requires that several Grid resources be used (each requiring mutual authentication), or if there is a need to have agents (local or remote) requesting services on behalf of a user, proxy creation will avoid to reenter the user’s pass phrase.

A proxy consists of a new certificate (with a new public key in it) and a new private key. The new certificate contains the owner's identity, modified slightly to indicate that it is a proxy. The new certificate is signed by the owner, rather than a CA (see the figure below). The certificate also includes a time notation after which the proxy should no longer be accepted by others. Proxies have limited lifetimes.

Fig. 1. Delegation and Single Sign-On

2.2.2. Virtual Organization Membership Service. A Virtual Organization [15] is a flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources.

The Virtual Organization Membership Service (VOMS) [14] provides information on the user's relationship with his Virtual Organization. It extends

262

Page 3: [IEEE Third International Symposium on Information Assurance and Security - Manchester, UK (2007.08.29-2007.08.31)] Third International Symposium on Information Assurance and Security

the proxy with information on VO membership, group, roles and capabilities. The service sends a signed attribute certificate to the client, which includes them in a proxy certificate (fully compatible with the Grid Security Infrastructure). Attribute Certificates are used to bind a set of attributes (like membership, roles, authorization info etc) with an identity.

Each VO has a database containing group membership, roles and capabilities information for each user.

2.2.3. The gLite Data Management Services. Data Management System is the subsystem of the gLite middleware which takes care of file manipulation. For Grid data management, the granularity is the file as in traditional computing. In a Grid environment, files can have replicas at many different sites. The middleware will provide the capabilities for replica management. Ideally, users do not need to know where a file is located, as they use logical names for the files that the Data Management services use to locate and access them.

The abstraction presented to the users of the gLite data services is that of a global file system, with very similar semantics. A client user application may look like a Unix shell which can seamlessly navigate in this virtual file system, listing files, changing directories, etc.

Files in the Grid can be referred to by different names: Grid Unique IDentifier (GUID), Logical File Name (LFN), Storage URL (SURL) and Transport URL (TURL). While the GUIDs and LFNs identify a file irrespective of its location, the SURLs and TURLs contain information about where a physical replica is located, and how it can be accessed.

A file can be unambiguously identified by its GUID; this is assigned the first time the file is registered in the Grid, and is based on the UUID standard to guarantee its uniqueness.

Users and applications need to locate files (or replicas) on the Grid. The File Catalogue is the service which maintains mappings between LFN(s), GUID and SURL(s). The LCG File Catalogue (LFC) [22] is the File Catalogue adopted by gLite.

The Storage Element (SE) is the service which allows a user or an application to store data for future retrieval and provides uniform access to data storage resources. It may control simple disk servers, large disk arrays or tape-based Mass Storage Systems (MSS). 3. Confidential Data in the Grid Storage Elements

One of the main benefits of the Grid Infrastructure is the possibility to use distributed storage space [6]. A community might like to use storage elements owned by an external organization to delegate the management of these machines and to avoid to buy specialized hardware and to hire specialized personnel. In this way the community could rent the storage space as needed and minimize human and hardware costs.

In the case of confidential data this scenario is not feasible. Indeed, the community should satisfy strong privacy requirements, for example when it has to manage medical or financial data. A mechanism to prevent the administrator of the machine accessing the data is required to store the confidential data in a storage element managed by an external organization.

At the present time the EGEE infrastructure grid Middleware, gLite, provides the same security infrastructure [20] for all the grid services. The authentication is performed using the X.509 infrastructure [12] and the VOMS [14] attributes are used to authorize the users. Moreover, an authorization method based on Access Control Lists (ACL) ensures data access only by their owners.

However, data are stored in clear format. The storage element administrator can access them bypassing the grid security infrastructure. This is known as the insider abuse problem [11].

The following requirements must be satisfied to guarantee confidentiality, integrity and to solve the insider abuse problem: • the data must be stored in an encrypted format; • the encryption operation must be performed in a trusted machine; • the information (e.g. the keys) required to decrypt the data must be accessible only by their owners (or authorized users) and stored in a trusted machine.

The complexity of the operations necessary to satisfy these properties must be hidden to the final user. Then, a new security grid service is required. This service has to provide users with suitable and simple tools to save confidential data in storage elements owned by an external organization in a transparent and secure way. 4. The Secure Storage Service

The Secure Storage Service [27] has been designed to be integrated in the gLite middleware.

It is made up by the following components:

• Command Line Applications: commands integrated in the gLite User Interface to encrypt and upload, decrypt and download files on the storage elements.

263

Page 4: [IEEE Third International Symposium on Information Assurance and Security - Manchester, UK (2007.08.29-2007.08.31)] Third International Symposium on Information Assurance and Security

• An Application Program Interface: the API allows the developer to write programs able to manage confidential data using the Secure Storage service. • The Keystore: a new grid element used to store and retrieve the users’ keys in a secure way. • The Secure Storage Framework: is a component of the service, internally used by the other components. It provides encryption/decryption functions and other utility functions. It take care of interaction with the Grid Data Management System.

Fig. 2. The Secure Storage Service

architecture

4.1. Encryption Algorithms

Encryption and decryption operations are performed using symmetric-key algorithms [8]. These algorithms are a class of algorithms for cryptography that use the same cryptographic key for both decryption and encryption.

We choose this kind of algorithms because they are generally much less computationally intensive than asymmetric key algorithms. In practice, this means that a quality asymmetric key algorithm is hundreds or thousands of times slower than a quality symmetric key algorithm.

Advanced Encryption Standard Algorithm (AES) [10] is the default encryption algorithm used in the Secure Storage Service with a 256 bits key length. However, the Service allows to provide new symmetric algorithms support, thanks to its modular architecture.

4.2. Command Line Applications

The Secure Storage service provides a set of new Command Line Applications on the gLite User Interface. These applications allow users to manage confidential data in a secure way.

A list of the main Command Line Applications of the Service is showed in the following section: • lcg-scr: the input parameters of this command are: a local file, a storage element, a Logical File Name (LFN) and a list of users authorized to access the file. The command generates an encryption key, encrypts the input file and uploads it on the storage element, registering its LFN in a LFC file catalogue. Moreover, it stores the key generated and used to encrypt the file in the keystore. An Access Control List (ACL) will be created and associated to the encryption key on the keystore. This ACL will contain all users authorized to access the file (a list of distinguished names [12], DNs).

Fig. 3. lcg-scr command: 1) A new random secret key is generated. 2) The key and the ACL are saved on the keystore. 3) The input file is encrypted inside user trusted environment. 4) The encrypted file is uploaded on the Grid Storage Element. • lcg-scp: the input parameter of this command is a LFN. It downloads the encrypted file identified by the input LFN, gets the key to decrypt the file from the keystore, decrypts the file and then store it on the local file-system. This command successfully returns only if the user is an authorized user (his DN is on the ACL associated to the key needed to decrypt the file).

264

Page 5: [IEEE Third International Symposium on Information Assurance and Security - Manchester, UK (2007.08.29-2007.08.31)] Third International Symposium on Information Assurance and Security

Fig. 4: lcg-scp command: 1) Get the secret key from the keystore. This operation fails if the user is not authorized. 2) Download the encrypted file on the local machine 3) Decrypt the file and save it on the local file-system. • lcg-sdel: the input parameter is a LFN. The command deletes one or all the file replicas. If the last replica is deleted, it deletes the key associated to this file on the keystore also. • lcg-add-dn: the input parameters are a LFN and a list of users’ DNs. The command adds the input DNs to the ACL associated to the key needed to decrypt the file identified by the input LFN. • lcg-del-dn: : the input parameters are a LFN and a list of users’ DNs. The command removes the input DNs from the ACL associated to the key needed to decrypt the file identified by the input LFN. 4.2.1. Data Integrity. The encryption operation guarantees only data privacy. Files saved on the Grid could be maliciously modified breaking the Grid Security Infrastructure.

Secure Storage Service ensures data integrity also. The lcg-scr commands can digital sign [9] the confidential data. Optionally, users can require the data digital signature before of the encryption operation, specifying a command option. The data and the digital signature will be saved encrypted on the selected Grid Storage Element.

The lcg-scp command automatically verifies data integrity if the data has been signed. 4.3. Application Program Interface

Developers can use the Secure Storage API in their Grid Applications to manage confidential data in a secure way.

The API is divided in two groups according to their functionalities.

The first group of functions behaves like Command Line Applications. They are the secure version of the lcg-utils API [22]. These are the main functions of the Secure Storage API: • int lcg_scr ( char *src_file, char *dest_file, char *guid, char* lfn, char *vo, char *relative_path, char *conf_file, int insecure, int verbose, char *actual_gid ); • int lcg_scp ( char *src_file, char *dest_file, char *vo, char *conf_file, int insecure, int verbose ); • int lcg_sdel ( char *src_file, int aflag, char *se, char *vo, char *conf_file, int insecure, int verbose, int timeout);

The second set of functions can be used to manage encrypted remote files as local files. These functions allow developers to read or write encrypted file blocks stored on a remote storage elements in a simple way. The functions hide the complexity of working with encrypted remote files. In this way, users can manage remote encrypted files as they were clear local files. They are the secure version of the LCG GFAL API [22]: • int securestorage_open( char *lfn, int flags, mode_t mode ); • int securestorage_write ( int fd, void *buffer, size_t size ); • int securestorage_read ( int fd, void *buffer, size_t size ); • int securestorage_close ( int fd ); • off_t securestorage_lseek (int fd, off_t offset, int whence); 4.4. The Keystore

The Keystore is a new grid element used to store and retrieve the users’ key in a secure way. It is identified by a host X.509 digital certificate and all its Grid transactions are mutually authenticated and encrypted as required by the Grid Security Infrastructure (GSI) model [17].

The Keystore is a critical node of the Secure Storage service because the data security depends on its integrity. For this reason the Keystore should be placed in a trusted domain (e.g. inside the local network of the community using the Secure Storage service) and should be appropriately protected by undesired connections. So, a well configured firewall is required.

At the application level the Keystore is a black box with a single interface towards the external world. This interface accepts mutually authenticated connections only and decides to serve or not the request through an authorization process, once the client is identified.

The operations performed during the authorization process are the followings:

265

Page 6: [IEEE Third International Symposium on Information Assurance and Security - Manchester, UK (2007.08.29-2007.08.31)] Third International Symposium on Information Assurance and Security

• the client request is processed if the client is a member of a enabled users list only and/or it belongs to an enabled Virtual Organization [14] or to a specific Virtual Organization Group. The request is discarded in any other cases; • if the client wants to retrieve a key (or wants to modify the access permission), the keystore checks if the request is coming from an authorized user inserted on the ACL associated to the request key. It knows the Distinguished Name of the user thanks to the GSI authentication.

The clients of the keystore are the command line applications and the API functions of the previously described services. For example, the keystore saves the encryption keys and the associated ACLs received by the lcg-scr commands on its repository and provides the keys to the lcg-scp commands. 5. Future Works

The Secure Storage Service will support Shamir's shared secret scheme [16] in the next release. This algorithm splits the encryption key in N parts and allow to rebuilt the key using K=N-1 parts.

Using Shamir’s algorithm, the Secure Storage Service will store N sub-keys on N different keystores. In this way the data confidentiality will be preserved even if the integrity of K-1 keystores is compromised, guaranteeing service availability also. 6. Conclusions

In the last years, the interest around the Grid paradigm has grown exponentially in the international research community and the business.

The Grid applications could use geographical distributed storage space managed by external organization. This scenario implies serious security risks as the insider abuse problem. At the present, the most common grid middlewares do not provide integrated tools to ensure full data security.

The Secure Storage Service tools allow users to manage securely confidential data. In this way, the grid user can use distributed storage elements without security risks and can delegate the management of these machines avoiding to buy specialized hardware (data privacy and integrity is guaranteed).

The Secure Storage Service has been developed for the EGEE infrastructure grid Middleware, gLite. 7. References

[1] I. Foster and C. Kesselman. “The Grid: blueprint for a new Computing Infrastructure“, Ed. Morkan Kauffman, 1997. [2] I. Foster, C. Kesselman and S. Tueckle, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, Lecture Notes in Computer Science, Vol. 2150, 2001. [3] I. Foster, “The Grid: A New Infrastructure for 21st Century Science”, Physics Today, Vol. 55, pp. 42-27, 2002. [4] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit”, Internation Journal of Supercomputer Applications, Vol. 11, pp- 115-128, 1997. [5] Grid computing – Wikipedia, The Free Encyclopedia: http://en.wikipedia.org/wiki/Grid_computing [6] A. Shoshani, A. Sim, J. Gu, “Storage Resource Managers: Middleware Components for Grid Storage”, Proceedings of the Nineteenth IEEE Symposium on Mass Storage Systems, 2002. [7] National Institute Standard Technology (NIST) - Computer Security Resource Center, “Information Security”: http://csrc.nist.gov/publications/nistpubs/800-59/SP800-59.pdf [8] Symmetric-key algorithm: http://en.wikipedia.org/wiki/Symmetric-key_algorithm [9] National Institute Standard Technology (NIST) – Computer Security Resource Center, “FIPS PUB 186 - Digital Signature Standard”: http://www.itl.nist.gov/fipspubs/fip186.htm [10] National Institute Standard Technology (NIST) – Computer Security Resource Center, “Advanced Encryption Standard”: http://csrc.nist.gov/CryptoToolkit/aes/rijndael/ [11] U.S.A. Department of the Treasury - Office of Thrift Supervision. “Fraud and Insider Abuse”: http://www.ots.treas.gov/docs/4/422134.pdf [12] Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile: http://www.ietf.org/rfc/rfc3280.txt [13] OpenSSL: http://www.openssl.org [14] Virtual Organization Membership Service (VOMS): http://voms.forge.cnaf.infn.it/home.html [15] Virtual Organization: http://gridcafe.web.cern.ch/gridcafe/grid&you/VO.html [16] A. Shamir, “How to share a secret”, Communications of the ACM 22 (1979), 612-613 [17] Grid Security Infrastructure (GSI): http://www.globus.org/toolkit/docs/4.0/security/key-index.html [18] EGEE project web site : http://public.eu-egee.org/ [19] gLite project web site: http://www.glite.org [20] gLite security infrastructure web site: http://glite.web.cern.ch/glite/security/ [21] gLite 3 User Guide: https://edms.cern.ch/file/722398//gLite-3-UserGuide.html [22] LCG project web site: http://www.cern.ch/lcg/ [23] TRIGRID project web site: http://www.trigrid.it [24] Grid INFN Laboratory for Dissemination Activities (GILDA) web site: https://gilda.ct.infn.it [25] Grid Group of INFN Catania, http://grid.ct.infn.it/ [26] UNICO S.R.L. : http://www.unicosrl.it [27] Secure Storage Project: http://securestorage.sourceforge.net

266