d4 science scientific data infrastructure promoting interoperability by embracing the value of the...
TRANSCRIPT
D4Science Scientific Data Infrastructure: promoting interoperability by embracing the value of the differences
Pasquale [email protected]
Networking sessionSeptember 2010
Brussels (Belgium)
www.d4science.eu
2
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Assumptions
Consolidated facts:
Very rich applications and data collections are currently maintained by a multitude of authoritative providers
Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, …
Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), …
Several standards are adopted in the same domain
Societal observations
• A rich variety of protocols, models, and formats • Create barriers in the usage of resources• Delay dramatically new exploitation patterns
Technical observations
Protocols, models, and formats heterogeneity increases load, Load increases failures
3
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
D4Science Vision
D4Science objectives:
hide heterogeneity, i.e. abstract over differences in location, protocol, and model;
embrace heterogeneity, i.e. allow for multiple locations, protocols, and models;
Technical goals
no bottlenecks: scale no less than the interfaced resources no outages: keep failures partial and temporary autonomicity: system reacts and recovers
4
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Hiding Heterogeneity [1/2]
D4Science is an ecosystem of e-infrastructures where: various communities cohabitate by maintaining their peculiarities and
policies, resources sharing and reuse of services from other domains is feasible
and affordable
5
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Hiding Heterogeneity [2/2]
D4Science approach:
• Heterogeneous resources are virtually accessible in a common ecosystem of resources
• despite their locations, technologies, and protocol
• Different communities have access to different views• according to the conditions under which the sharing can occur
• Each community can define its own virtual research environment to satisfy specific needs
• for a limited timeframe and at no cost for the providers of the resource
• Several virtual research environments can coexist• without interfering each other even by competing for the same
resources
6
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Approaches and solutions to achieve interoperability :
Blackboard-based
asynchronous communication between components in a system one protocol to R/W and one language to specify messages
Wrapper/ Mediator-based
translates one interface for a component into a compatible interface
Proxy-based
exposes the same interface but allows additional operation over received calls
Adaptor-based
provides a unified interface to a set of other components interfaces and encapsulates how this set of objects interact
Broker-based
Specialises an Adaptor by coordinating communication
Embracing Heterogeneity:Interoperability Approaches
7
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Embracing Heterogeneity:Data Representation, Discovery, and Access
D4Science offers
Open transformation service framework Extendible with specific source-target mediators To use for metadata and data crosswalk transformations Tailored for statistical, geospatial, temporal, and textual data
Rich set of reference data Extendible with domain-specific reference data To reuse in services for data curation and harmonization
Support for geospatial services To capture, manage, analyze, and display all forms of data that can be
geographically referenced
Integrated resources registry Format agnostic To support discovery and access
8
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
D4Science offers solutions to:
Decouple the business domain and infrastructure specific logic from the core “execution” functionality
Invocate a wide range of logic components: SOAP and REST WebServices, Shell Scripts, Executable Binaries, POJOs, …
Support most of the execution paradigms: batch, map-reduce, synchronous call
Bridges key distributed computation technologies: grid (gLite and Globus), Condor, Hadoop
Control and monitor the execution of a processing flow
Staging of data among different storage providers
Streaming data among computation elements
Embracing Heterogeneity: Process Execution [1/2]
9
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Embracing Heterogeneity: Process Execution [2/2]
By using adaptors that
operate on a specific third party language and translate them into native constructs,
allow for the creation of complex workflows that exploit several diverse technologies deployed on different infrastructures
10
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Conclusions
Facts
Very rich services and data collections are currently maintained by a multitude of authoritative providers
Several standards are adopted in the same domain
Interoperability approaches are key to exploit such richness
D4Science offers a variety of patterns, tools, and solutions
to delivery interoperability solutions and interconnect Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms
to decrease the cost of adoption to reduce the time to market of new ideas to deal with plethora of standards
11
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Supported Standards
WS-* WSRF WS-BPEL
JDL JSDL Glue Schema (part)
X-* DC, TEI, ISO etc
JSR (several)
GSI-Security XACML SAML
OpenSearch
OGC related https://quality.wiki.d4science.research-infrastructures.eu/quality/index.php/Standards
Comply with: OAI-PMH OAI-ORE
12
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Supported Standards
WSRF Specifications
• WS-ResourceProperties (WSRF-RP)• WS-ResourceLifetime (WSRF-RL)• WS-ServiceGroup (WSRF-SG)• WS-BaseFaults (WSRF-BF)
JSR
• 168 : Simple Portlets• 286 : 186 update• 160 : JMX
WSN Specifications:
• WS-BaseNotification• WS-Topics• (WS-BrokeredNotification)
WS-* Standards
• SOAP• WSDL• WS-Addressing
ISO:
• ISO3166 countries• ISO4217 currencies• ISO1915 geo-location
X-*
• XML• XSD• XSL• XSLT• xPath• xQuery
OGC
• Web Coverage Processing Service • Web Coverage Service • Web Feature Service • Web Map Context • Web Map Service • Web Map Tile Service • Web Processing Service • Web Service Common
OGF Standard:
• Glue Schema (2)
……….
Comply with: OAI-PMH OAI-ORE
13
www.d4science.euD4Science Scientific Data InfrastructureBrussels, 29 September 2010
Thanks
www.gcube-system.org
www.d4science.eu
Pasquale PaganoD4Science-II Technical [email protected]
Donatella CastelliD4Science-II Project [email protected]
Jessica Michel AssoumouD4Science-II Administrative and Financial [email protected]
D4Science is powered by the open-source gCube framework