data area overview

24
OGF24 15 September 2008 Data Area Overview Erwin Laure <[email protected]> David E. Martin <[email protected]> Data Area Directors

Upload: ellema

Post on 16-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Data Area Overview. OGF24 15 September 2008. Erwin Laure David E. Martin Data Area Directors. Data Area Goals. The Data Area groups explore different aspects of data handling on grids Access Transport Management - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Area Overview

OGF2415 September 2008

Data Area Overview

Erwin Laure <[email protected]>David E. Martin <[email protected]>Data Area Directors

Page 2: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Data Area Goals

• The Data Area groups explore different aspects of data handling on grids• Access• Transport• Management

• Overall Data Architecture developed by OGSA Data Architecture group:• http://www.ogf.org/documents/GFD.121.pdf

2

Page 3: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Data Access

• Goals: locate and provide seamless access to data stored on Grids

• Data Access and Integration Services (DAIS-WG)• Base Specs Published for Database Access (GFD 74,75,76)• Implementation in OMII-UK• Now Working on Data Access Services for RDF Data Resources

• Grid File Systems (GFS-WG)• Naming Spec Published – Resource Namespace Service (GFD101)• Working on Resource Catalog• Prototypes from SDSC, UVA, Univ. of Tsukuba

• Data Format Description Language (DFDL-WG)• XML-based languagefor describing the structure of binary and textual files and data streams • Simplifying the Concepts and Trying to Remove Complexity to Shorten Draft Spec• Prototypes from LANL and IBM

• Byte IO (ByteIO-WG)• Web Service interface for providing "POSIX-like" file functionality (GFD 87,88)• Spec Finished Comment, Need to Make Small Changes• Production Version from UVA, Will Be in OMII

3

Page 4: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Data Transport

• OGSA Data Movement Interface (OGSA-DMI-WG)• Discover and negotiate proper data transport protocols and

manage data transport (GFD134)• Working on interoperability

• GridFTP WG (GridFTP-WG)• Grid enabled FTP protocol• Spec Published 3 Years Ago (GFD20)• Many Production Implementations• Need Experience Report for Full Standard

4

Page 5: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Data Management

• Grid Storage Management (GSM-WG)• Storage Resource Manager (SRM) to provide common interface to

storage resources (GFD129)• Several interoperating implementations in production use• Working on 3.0 Spec

• Information Dissemination (INFOD-WG)• Model for Information Dissemination; focus on query-like operations• Base specs published (GFD110)• Looking at candidates for follow-on Work

• Storage Networking Community Group (SN-CG)• Led by Vincent Franceschini, Chair of SNIA Board• Portal to SNIA Work• Follow-on to EGA Data Provisioning WG

5

Page 6: Data Area Overview

Data Grid Specifications and Use Cases

Material provided byAndrew Grimshaw ([email protected])

Page 7: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Outline

• Background – The Rule of 3s• Specifications• Implementations

Page 8: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Classic three layer view

Interfaces, e.g. FUSE,SAGA, NFS, CIFS

Standard portypes (RNS, ByteIO, WS-DAI, SRM)

Resource Provisioning LayerFiles, databases, instruments

Grid Services Layer

Access Layer

Page 9: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Classic 3-layer name scheme

…File replica 2File replica 2

WS-name EPRWS-name EPRFile replica 1File replica 1

File replica mFile replica m

RNS file name 1RNS file name 1

RNS file name nRNS file name n

Human names Abstract name:EPI, rebinding

WS-Names are WS-Addresses with optionalEPI and resolver EPR

This is essentially a table

Addresses

Page 10: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Outline

• Background – The Rule of 3s• Specifications• Implementations

Page 11: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Six specs

• RNS – directory service that maps human names (strings) to abstract names or addresses (EPRs)• Insert, delete, list• Can build directed graphs, including trees• Leaves can be most anything, web pages, ByteIO endpoints, DMI

endpoints, BES resources• RNS 1.1 under development

• WS-Naming – A profile on WS-Addressing that supports identity, abstract name to address mapping, and rebinding of addresses – migration, failure, and replication transparency

• ByteIO – think POSIX file/steam, read, write, stat• WS-DAI – query interface onto structured data, e.g., relational

databases or XML databases• SRM – Management of data stores

• BES – Accepts JSDL documents and executes them

Page 12: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Outline

• Background – The Rule of 3s• Specifications• Implementations

Page 13: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

There are several implementations(not a complete list!)

RNS ByteIO WS-Naming WS-DAI SRM

Genesis II Yes Yes Yes Yes

gFarm Yes planned

EGEE/glite Experimental Prototype

Planned? Used by some user communities

yes

NeSC Edinburgh

yes yes

Globus yes (just rebinding)

yes

There are over a dozen OGSA-BES/HPC-BP implementations

.

Page 14: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Let’s see what you can do with these specifications

• Imagine • an access layer that consists of a Grid-aware FUSE

file system driver for Linux (both Genesis II and gFarm have these) or a Grid-aware Installable File System (IFS) for Windows (Genesis II has one – G-ICING).

• a provisioning layer that proxies Windows/Unix files and directories into the Grid as RNS and ByteIO endpoints and relational databases as WS-DAI endpoints.

• OGSA-BES endpoints that also support the RNS specification – allowing jobs to be started simply by copying a JSDL file “into” the directory.

• a WS-Trust STS endpoint that also supports RNS

Page 15: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

• Users can access Grid resources simply by copying files, dragging and dropping, etc.

• Applications don’t need to be re-written to access the Grid

Page 16: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

You don’t have to imagine

Page 17: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Windows Grid-awre IFS

Page 18: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Linux Grid-aware FUSE

Page 19: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Using RNS to name non-file-system components

• BES resources are also RNS directories

• We can schedule a job on a resource simply by “dropping” it into the directory

Page 20: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Use SRM to abstract from Storage implementations

20

Client SRM

Storage5

1

2

1. The client asks the SRM for the file providing an SURL (Site URL)2. The SRM asks the storage system to provide the file3. The storage system notifies the availability of the file and its location 4. The SRM returns a TURL (Transfer URL), i.e. the location from where the

file can be accessed5. The client interacts with the storage using the protocol specified in the

TURL

3

4

• could use RNS• give back byte-I/O endpoint

Page 21: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

WS-DAI endpoints that support RNS

• To execute a query, copy a text file with the SQL into the directory that represents the database. The results of the query are accessible as either a file (they can be read, “cat’d”, or loaded into an Excel file as a csv), or subsequently queried as well.

Page 22: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Data publisher

Mapping data into the Grid

Data clients Data clients

LinuxWindowsWindows

• Links directories and files from source location to data grid directory and

user-specified name• Presents unified view of

the data across platforms, locations,

domains, etc.• Data publisher controls

authorization policy.

Data publisherData publisher

Page 23: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Moral of the story

• RNS allows us to place arbitrary resources into a traditional directed graph/tree structure

• FUSE/IFS map RNS namespaces into the local file system

• Users can interact with the grid without knowing anything about grids

Page 24: Data Area Overview

Data Area OverviewErwin Laure, [email protected]

David E. Martin, [email protected]

Data Area Future

• From Data Area Gaps Analysis• High-level Data Movement• Caching and Replication• Integrated Data Management• Transactions in a Grid

• Recent Interest• Storage Provisioning• Virtualization• Provenance, Integrity, Policy• Link to Digital Libraries

• Dependencies• OGSA• Security: IETF, OASIS• Management: DMTF, WSDM/WS-Man Convergence• WS-*: OASIS and W3C, WS-RF/WS-T Convergence