viktor prasanna,yogesh simmhan, alok kumbhare, sreedhar natarajan 04/20/2012

7
Floe: Designing A Continuous Data Flow Engine for Dynamic Applications on the Cloud Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan

Upload: brice-stone

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

Floe: Designing A Continuous Data Flow Enginefor Dynamic Applications on the Cloud

Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan

04/20/2012

Page 2: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

• Workflow and Stream Processing have been used to for pipeline basedapplications

• D3 Science – Dynamic, Distributed, Data Intensive Applications Dynamism

Data not being static and flowing continuously Data rates and size being changing depending on

domain requirements (QoS requirements)

• Workflows have compositional characteristics but limit dynamism• Stream Processing Systems provide real time processing but lack

the compositional and data diversity support

• Map Reduce framework dynamism in data flow but severely lacks compositional flexibility

• An architecture which has the capability of providing Compositional capability and allows real time stream processing Provide map reduce based key value exchange

Motivation

2

Page 3: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

• Data Flow Model

Workflows follow Control Flow and data flow

For continuous data, its difficultto define strict control flow

Floe follows a Data Flow Model Allows for pipelined execution

• Dynamic Data Mapping• Decide whether the output is sent to one output channel(Round Robin)• Same Output is sent to every output channel• Map Reduce framework wires al Mapper to Reducer

• Dynamically maps data to reducer at runtime

• Typed Output Channel

Design Paradigms of Floe

3

Page 4: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

• Continuous Execution• System should support continuous processing of data • Along with batch processing which takes an input and run once• Framework Should be able to pause and resume execution• For Low latency applications resources are provisioned and

workflow needs to be executed for next batch of input

• Decentralized Orchestration• Centralized Workflow becomes a bottleneck when data flows between

tasks which are distributed• Decentralized orchestration is better suited , where each component

is aware of subsequent component• Input Connections, Output Connections etc..

• Dynamism in Data Rates & Latency Needs• Apart from dynamism in data flow, dynamism occur in data rates

and data sizes• QoS requirement of Application determines the execution rate by

adding newresources at runtime

• Framework should be able to handle this.

Design Paradigms of Floe Contd

4

Page 5: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

• Elastic Resources Cloud inherently provides dynamic provisioning of resources Resources needs to be provisioned ahead of time considering

the latency involved in initialization Application should resilient to overcome the failures

• Dynamic Task Update Considering the continuous data flow execution

− Pausing, Updating task logic and resuming the workflow in place is costly since the data should be stored

− An nice feature would be to have an update tracer event whichupdates task logic without pausing the workflow

• Dynamic Data Flow Updates• Depending on the requirements structure of a data flow is possible to

change• Tasks could be added or removed• Similar update tracer could be used to update the edge properties

rather thanthe task properties.

Design Paradigms of Floe Contd

5

Page 6: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

Floe Architecture

6

Page 7: Viktor Prasanna,Yogesh Simmhan, Alok Kumbhare, Sreedhar Natarajan 04/20/2012

Smart Grid Streaming Pipeline

Use Case

7