meandre: semantic-driven data-intensive flows in the clouds

32
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds Xavier Llorà National Center for Supercomputing Applications University of Illinois at Urbana-Champaign [email protected] The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Upload: xavier-llora

Post on 27-Dec-2014

1.863 views

Category:

Technology


2 download

DESCRIPTION

A quick overview of the Meandre infrastructure, programming models and tools.

TRANSCRIPT

Page 1: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: !Semantic-Driven Data-Intensive !

Flows in the Clouds

Xavier Llorà!

National Center for Supercomputing Applications!University of Illinois at Urbana-Champaign!

[email protected] The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 2: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Yes, It is not a Typo

Page 3: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

SEASR: Design Goals

•  Transparency

–  From a single laptop to a HPC cluster

–  Not bound to a particular computation fabric

–  Allow heterogeneous development

•  Intuitive programming paradigm

–  Modular Components assembled into Flows

–  Foster Collaboration and Sharing

•  Open Source

•  Service Orientated Architecture (SOA)

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 4: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Infrastructure

•  SEASR/Meandre Infrastructure:

–  Dataflow execution paradigm

–  Semantic-web driven

–  Web oriented

–  Supports publishing services

–  Promotes reuse, sharing, and collaboration

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 5: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Data Driven Execution

•  Execution Paradigms

–  Conventional programs perform computational tasks by executing a sequence of instructions.

–  Data driven execution revolves around the idea of applying transformation operations to a flow or stream of data when it is available.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 6: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Dataflow Example

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Value1

Value2

Sum

Page 7: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Dataflow Example

•  Dataflow Addition Example

–  Logical Operation ‘+’

–  Requires two inputs

–  Produces one output

•  When two inputs are available

–  Logical operation can be preformed

–  Sum is output

•  When output is produced

–  Reset internal values

–  Wait for two new input values to become available The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Value1

Value2

Sum

Page 8: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: The Dataflow Component

•  Data dictates component execution semantics

Component

P

Inputs Outputs

Descriptor in RDF!of its behavior

The component !implementation

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 9: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Data Driven Execution

•  Dataflow Approach

–  May have zero to many inputs

–  May have zero to many outputs

–  Performs a logical operation when data is available

•  The component define its firing policy

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 10: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Component Metadata

•  Describes a component

•  Separates:

–  Components semantics (black box)

–  Components implementation (Java, Python, Lisp)

•  Provides a unified framework:

–  Basic building blocks or units (components)

–  Complex tasks (flows)

–  Standardized metadata

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 11: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Semantic Web Concepts

•  Relies on the usage of the resource description framework (RDF)

•  Provides a common framework to share and reuse data across application, enterprise, and community boundaries

•  Focuses on common formats for integration and combination of data drawn from diverse sources

•  Pays special attention to the language used for recording how the data relates to real world objects

•  Allows navigation to sets of data resources that are semantically connected.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 12: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Metadata Ontologies

•  Meandre's metadata relies on three ontologies:

–  The RDF ontology serves as a base for defining Meandre descriptors

–  The Dublin Core Elements ontology provides basic publishing and descriptive capabilities in the description of Meandre descriptors

–  The Meandre ontology describes a set of relationships that model valid components, as understood by the Meandre execution engine architecture

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 13: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: The Dataflow Component

•  Data dictates component execution semantics

Component

P

Inputs Outputs

Descriptor in RDF!of its behavior

The component !implementation

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 14: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Components Types

•  Components are the basic building block of any computational task.

•  There are two kinds of Meandre components:

–  Executable components

•  Perform computational tasks that require no human interactions during runtime

•  Processes are initialized during flow startup and are fired when in accordance to the policies defined for it.

–  Control components

•  Used to pause dataflow during user interaction cycles

•  WebUI may be a HTML Form, Applet, or Other user interface

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 15: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Wrapping With Components

•  Component provides inputs, outputs, properties

•  You code

–  Inside!

–  Call from!

–  A WS front end

–  Interactive application

–  Request/response cycles

Page 16: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Flow (Complex Tasks)

•  A flow is a collection of connected components

Read

P Merge

P

Do

P

Show

P

Get

P

Dataflow execution The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 17: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Programming Paradigm

•  The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components develop by third parties to be assembled in a new flow.

•  There are two ways to develop flows :

–  Meandre’s Workbench visual programming tool

–  Meandre’s ZigZag scripting language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 18: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Workbench Existing Flow

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Flows

Components

Locations

Page 19: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: ZigZag Script Language

•  ZigZag is a simple language for describing data-intensive flows

–  Modeled on Python for simplicity.

–  ZigZag is declarative language for expressing the directed graphs that describe flows.

•  Command-line tools allow ZigZag files to compile and execute.

–  A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau).

–  Mau(s) can then be executed by a Meandre engine.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 20: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

•  ZigZag code that represents example flow:

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Imports the three required components and creates the component aliases # import <http://localhost:1714/public/services/demo_repository.rdf> alias <http://test.org/component/push_string> as PUSH alias <http://test.org/component/concatenate-strings> as CONCAT alias <http://test.org/component/print-object> as PRINT # # Creates four instances for the flow # push_hello, push_world, concat, print = PUSH(), PUSH(), CONCAT(), PRINT() # # Sets up the properties of the instances # push_hello.message, push_world.message = "Hello ", "world!" # # Describes the data-intensive flow # @phres, @pwres = push_hello(), push_world() @cres = concat( string_one: phres.string; string_two: pwres.string ) print( object: cres.concatenated_string ) #

Page 21: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# # Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) print( object:pt.string )

•  Automatic Parallelization

–  Multiple instances of a component could be run in parallel to boost throughput.

–  Specialized operator available in ZigZag Scripting to cause multiple instances of a given component to used

•  Consider a simple flow example show in the diagram

•  The dataflow declaration would look like

Page 22: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

•  Automatic Parallelization

–  Adding the operator [+AUTO] to middle component

–  [+AUTO] tells the ZigZag compiler to parallelize the “pass component instance” by the number of cores available on system.

–  [+AUTO] may also be written [+N] where N is an numeric value to use for example [+10].

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+AUTO] print( object:pt.string )

Page 23: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

•  Automatic Parallelization

–  Adding the operator [+4] would result in a directed grap

Meandre: ZigZag Script Language

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

# Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4] print( object:pt.string )

# Describes the data-intensive flow # @pu = push() @pt = pass( string:pu.string ) [+4!] print( object:pt.string )

Page 24: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Flows to MAU

•  Flows can be executed using their RDF descriptors

•  Flows can be compiled into MAU

•  MAU is:

–  Self-contained representation

–  Ready for execution

–  Portable

–  The base of flow execution in grid environments

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 25: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

And Behind The Scenes?

•  Architecture designed to scale

•  Infrastructure

–  Laptop

–  Server

–  Cluster

•  Tools

–  Talk to the infrastructure

–  Workbench, ZigZag

Page 26: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: The Architecture

•  The design of the Meandre architecture follows three directives:

–  provide a robust and transparent scalable solution from a laptop to large-scale clusters

–  create an unified solution for batch and interactive tasks

–  encourage reusing and sharing components

•  To ensure such goals, the designed architecture relies on four stacked layers and builds on top of service-oriented architectures (SOA)

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 27: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: Basic Single Server

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 28: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre MDX: Cloud Computing

•  Servers can be

–  instantiated on demand

–  disposed when done or on demand

•  A cluster is formed by at least one server

•  The Meandre Distributed Exchange (MDX)

–  Orchestrates operational integrity by managing cluster configuration and membership using a shared database resource.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 29: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre MDX: The Picture

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

MDXBa

ckbo

ne

Page 30: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre MDX: The Architecture

•  Virtualization infrastructure

–  Provide a uniform access to the underlying execution environment. It relies on virtualization of machines and the usage of Java for hardware abstraction.

•  IO standardization

–  A unified layer provides access to shared data stores, distributed file-system, specialized metadata stores, and access to other service-oriented architecture gateways.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 31: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre MDX: The Architecture

•  Data-intensive flow infrastructure

–  Provide the basic Meandre execution engine for data-intensive flows, component repositories and discovery mechanisms, extensible plugins and web user interfaces (webUIs).

•  Interaction layer

–  Can provide self-contained applications via webUIs, create plugins for third-party services, interact with the embedding application that relies on the Meandre engine, or provide services to the cloud.

The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation

Page 32: Meandre: Semantic-Driven Data-Intensive Flows in the Clouds

Meandre: !Semantic-Driven Data-Intensive !

Flows in the Clouds

Xavier Llorà!

National Center for Supercomputing Applications!University of Illinois at Urbana-Champaign!

[email protected] The SEASR project and its Meandre infrastructure!are sponsored by The Andrew W. Mellon Foundation