data area overview
DESCRIPTION
Data Area Overview. OGF24 15 September 2008. Erwin Laure David E. Martin Data Area Directors. Data Area Goals. The Data Area groups explore different aspects of data handling on grids Access Transport Management - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/1.jpg)
OGF2415 September 2008
Data Area Overview
Erwin Laure <[email protected]>David E. Martin <[email protected]>Data Area Directors
![Page 2: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/2.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Data Area Goals
• The Data Area groups explore different aspects of data handling on grids• Access• Transport• Management
• Overall Data Architecture developed by OGSA Data Architecture group:• http://www.ogf.org/documents/GFD.121.pdf
2
![Page 3: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/3.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Data Access
• Goals: locate and provide seamless access to data stored on Grids
• Data Access and Integration Services (DAIS-WG)• Base Specs Published for Database Access (GFD 74,75,76)• Implementation in OMII-UK• Now Working on Data Access Services for RDF Data Resources
• Grid File Systems (GFS-WG)• Naming Spec Published – Resource Namespace Service (GFD101)• Working on Resource Catalog• Prototypes from SDSC, UVA, Univ. of Tsukuba
• Data Format Description Language (DFDL-WG)• XML-based languagefor describing the structure of binary and textual files and data streams • Simplifying the Concepts and Trying to Remove Complexity to Shorten Draft Spec• Prototypes from LANL and IBM
• Byte IO (ByteIO-WG)• Web Service interface for providing "POSIX-like" file functionality (GFD 87,88)• Spec Finished Comment, Need to Make Small Changes• Production Version from UVA, Will Be in OMII
3
![Page 4: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/4.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Data Transport
• OGSA Data Movement Interface (OGSA-DMI-WG)• Discover and negotiate proper data transport protocols and
manage data transport (GFD134)• Working on interoperability
• GridFTP WG (GridFTP-WG)• Grid enabled FTP protocol• Spec Published 3 Years Ago (GFD20)• Many Production Implementations• Need Experience Report for Full Standard
4
![Page 5: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/5.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Data Management
• Grid Storage Management (GSM-WG)• Storage Resource Manager (SRM) to provide common interface to
storage resources (GFD129)• Several interoperating implementations in production use• Working on 3.0 Spec
• Information Dissemination (INFOD-WG)• Model for Information Dissemination; focus on query-like operations• Base specs published (GFD110)• Looking at candidates for follow-on Work
• Storage Networking Community Group (SN-CG)• Led by Vincent Franceschini, Chair of SNIA Board• Portal to SNIA Work• Follow-on to EGA Data Provisioning WG
5
![Page 7: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/7.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Outline
• Background – The Rule of 3s• Specifications• Implementations
![Page 8: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/8.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Classic three layer view
Interfaces, e.g. FUSE,SAGA, NFS, CIFS
Standard portypes (RNS, ByteIO, WS-DAI, SRM)
Resource Provisioning LayerFiles, databases, instruments
Grid Services Layer
Access Layer
![Page 9: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/9.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Classic 3-layer name scheme
…File replica 2File replica 2
WS-name EPRWS-name EPRFile replica 1File replica 1
File replica mFile replica m
RNS file name 1RNS file name 1
RNS file name nRNS file name n
…
Human names Abstract name:EPI, rebinding
WS-Names are WS-Addresses with optionalEPI and resolver EPR
This is essentially a table
Addresses
![Page 10: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/10.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Outline
• Background – The Rule of 3s• Specifications• Implementations
![Page 11: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/11.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Six specs
• RNS – directory service that maps human names (strings) to abstract names or addresses (EPRs)• Insert, delete, list• Can build directed graphs, including trees• Leaves can be most anything, web pages, ByteIO endpoints, DMI
endpoints, BES resources• RNS 1.1 under development
• WS-Naming – A profile on WS-Addressing that supports identity, abstract name to address mapping, and rebinding of addresses – migration, failure, and replication transparency
• ByteIO – think POSIX file/steam, read, write, stat• WS-DAI – query interface onto structured data, e.g., relational
databases or XML databases• SRM – Management of data stores
• BES – Accepts JSDL documents and executes them
![Page 12: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/12.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Outline
• Background – The Rule of 3s• Specifications• Implementations
![Page 13: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/13.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
There are several implementations(not a complete list!)
RNS ByteIO WS-Naming WS-DAI SRM
Genesis II Yes Yes Yes Yes
gFarm Yes planned
EGEE/glite Experimental Prototype
Planned? Used by some user communities
yes
NeSC Edinburgh
yes yes
Globus yes (just rebinding)
yes
There are over a dozen OGSA-BES/HPC-BP implementations
.
![Page 14: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/14.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Let’s see what you can do with these specifications
• Imagine • an access layer that consists of a Grid-aware FUSE
file system driver for Linux (both Genesis II and gFarm have these) or a Grid-aware Installable File System (IFS) for Windows (Genesis II has one – G-ICING).
• a provisioning layer that proxies Windows/Unix files and directories into the Grid as RNS and ByteIO endpoints and relational databases as WS-DAI endpoints.
• OGSA-BES endpoints that also support the RNS specification – allowing jobs to be started simply by copying a JSDL file “into” the directory.
• a WS-Trust STS endpoint that also supports RNS
![Page 15: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/15.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
• Users can access Grid resources simply by copying files, dragging and dropping, etc.
• Applications don’t need to be re-written to access the Grid
![Page 16: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/16.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
You don’t have to imagine
![Page 17: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/17.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Windows Grid-awre IFS
![Page 18: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/18.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Linux Grid-aware FUSE
![Page 19: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/19.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Using RNS to name non-file-system components
• BES resources are also RNS directories
• We can schedule a job on a resource simply by “dropping” it into the directory
![Page 20: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/20.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Use SRM to abstract from Storage implementations
20
Client SRM
Storage5
1
2
1. The client asks the SRM for the file providing an SURL (Site URL)2. The SRM asks the storage system to provide the file3. The storage system notifies the availability of the file and its location 4. The SRM returns a TURL (Transfer URL), i.e. the location from where the
file can be accessed5. The client interacts with the storage using the protocol specified in the
TURL
3
4
• could use RNS• give back byte-I/O endpoint
![Page 21: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/21.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
WS-DAI endpoints that support RNS
• To execute a query, copy a text file with the SQL into the directory that represents the database. The results of the query are accessible as either a file (they can be read, “cat’d”, or loaded into an Excel file as a csv), or subsequently queried as well.
![Page 22: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/22.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Data publisher
Mapping data into the Grid
Data clients Data clients
LinuxWindowsWindows
• Links directories and files from source location to data grid directory and
user-specified name• Presents unified view of
the data across platforms, locations,
domains, etc.• Data publisher controls
authorization policy.
Data publisherData publisher
![Page 23: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/23.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Moral of the story
• RNS allows us to place arbitrary resources into a traditional directed graph/tree structure
• FUSE/IFS map RNS namespaces into the local file system
• Users can interact with the grid without knowing anything about grids
![Page 24: Data Area Overview](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814bbb550346895db88e0f/html5/thumbnails/24.jpg)
Data Area OverviewErwin Laure, [email protected]
David E. Martin, [email protected]
Data Area Future
• From Data Area Gaps Analysis• High-level Data Movement• Caching and Replication• Integrated Data Management• Transactions in a Grid
• Recent Interest• Storage Provisioning• Virtualization• Provenance, Integrity, Policy• Link to Digital Libraries
• Dependencies• OGSA• Security: IETF, OASIS• Management: DMTF, WSDM/WS-Man Convergence• WS-*: OASIS and W3C, WS-RF/WS-T Convergence