a stratified approach for supporting high throughput event processing applications july 2009 geetika...

26
A Stratified Approach for Supporting High Throughput Event Processing Applications July 2009 Geetika T. Lakshmanan Yuri G. Rabinovich Opher Etzion IBM T. J. Watson Research Center IBM Haifa Research Lab IBM Haifa Research Lab [email protected] [email protected] [email protected]

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

A Stratified Approach for Supporting High Throughput Event Processing Applications

July 2009

Geetika T. Lakshmanan Yuri G. Rabinovich Opher EtzionIBM T. J. Watson Research Center IBM Haifa Research Lab IBM Haifa Research Lab

[email protected] [email protected] [email protected]

2

Outline Motivation and Goal

Definitions

Related Work

Overview of our solution– Credit-Card Scenario

– Profiling and initial assignment of nodes to strata

– Stratification

– Load Distribution Algorithm

Algorithm optimizations and support for dynamic changes in event processing graph

Implementation and Results

Conclusion

3

Our GoalDevise a generic framework to maximize the overall input (and thus output) throughput of an event processing application which is represented as an EPN, given a specific set of resources (cluster of nodes with varying computational power) and a traffic model. The framework should be adaptive to changes either in the configuration or in the traffic model.

EPNEPNEvent

ProducerEvent

Consumer

Event

Producer

Event

Producer

Event

Consumer

Event

Consumer

Engine

EPAEPA

EPAEPA

Engine

EPAEPA

EPAEPA

Engine

EPAEPA

EPAEPA

Repository

4

Why is this an important problem? Quantity of events that a single application needs to process is constantly increasing (E.g. RFID events, Massive Online Multiplayer Games).

Manual partitioning is difficult (due to semantic dependencies between event processing agents) particularly when it is required to be adaptive and dynamic.

5

Event Processing Agent An event processing agent has input and output event channels.

In general it receives a collection of events as input, derives one or more events as output and emits them on one or more output channels.

The input channels are partitioned according to a context which partitions the space of events according to semantic partitions of relevance

Event Processing Agent

InputChannel

Context Agent SpecDerived Event

Definition

OutputChannel

Event Processing Agent

Filter Transform Detect Pattern

Translate

Route

Aggregate Split Enrich

6

Related Work Scalability in event processing

– Large scale event processing applications (E.g. Astrolobe, PIER, Sienna.)

– Kulkarni et al., Wu et al.

– More work needs to be done.

Numerous centralized implementations arising due to interdependencies among event processing agents.

Synergy between stream processing and event processing.– Load distribution techniques for streams proposed:

• Shah et al.• Mehta et al.• Gu et al.• Xing et al.• Zhou et al.

7

Is this a solved problem?

Scalable event processing

implementations

(Astrolobe, PIER, Sienna)

Centralized

event processing

ImplementationsLoad distribution algorithms

for scalable stream processingShah et al., Mehta et al., Gu et al., Xing et al.

Zhou et al., Liu et al. ………

Event-at-a-time Implementations Set-at-a-time Implementations

Centralized stream processing

Implementations

8

Overview of Our Solution1. Profiling

– Used to assign agents to nodes in order to maximize throughput

2. Stratification of EPN– Splitting the EPN into strata layers

– Based on semantic dependencies between agents

– Distributed implementation with event proxy to relay events between strata

3. Load Distribution– Distribute load among agents dynamically during runtime and

respect statistical load relationships between nodes

9

Distributed Event Processing Network Architecture Input: Specification of an Event Processing Application

Output: Stratified EPN (event processing operations event processing agents)

EP Node

Stratum 1

EPProxy

EP Node

EP Node

DB

EP Node

Stratum 1

EPProxy

EP Node

EP Node

DB

EP Node

Stratum 2

EPProxy

EP Node

EP Node

DB

EP Node

Stratum 2

EPProxy

EP Node

EP Node

DB

EP Node

Stratum 3

EPProxy

EP Node

EP Node

DB

EP Node

Stratum 3

EPProxy

EP Node

EP Node

DB

Events Events

Event Proxy receives input events and routes them to nodes in a stratum according to the event context.

Event proxy periodically collects performance statistics per node in a stratum.

10

Stratified Event Processing Graph1. Define the event processing application in the form of an Event

Processing Network Dependency Graph G=(V,E) (directed edges from event source to event target)

2. Overview of Stratification Algorithm Create partitions by finding sub graphs that are independent in

the dependency graph. For each sub graph, construct a network of EPAs. Push filters to the beginning of the network to filter out irrelevant

events. Iterate through graph and identify areas of strict interdependence.

(i.e. sub graphs with no connecting edges). For each sub graph define stratum levels.

11

Credit Card Scenario

Amount > 100

More Than5 OccurrencesWithin 1 Hour

Amount > 100

More Than3 OccurrencesWithin 1 Hour

Cancel FollowsDiscount

High VolumePurchase

Purchase

Cancel High VolumeCancel

Give Discount to Company

Discount Canceled

Cancel Discount to Company

Amount > 100

More Than5 OccurrencesWithin 1 Hour

Amount > 100

More Than3 OccurrencesWithin 1 Hour

Cancel FollowsDiscount

High VolumePurchasePurchase

CancelHigh Volume

Cancel

Give Discountto Company

Discount Canceled

Cancel Discount to Company

Stratum 1 Stratum 2 Stratum 3

Stratification algorithm

Event Processing Dependency Graph

Stratified Event Processing Graph

12

Initial Placement of Agents

Goal is to maximize throughput.

Assume agents in a single stratum are replicated on all nodes in that stratum.

Overall strategy:

1. Profiling. Determine maximum event processing capability of available nodes.

– ri : Maximum possible event processing rate (events/sec)

– di : Maximum possible derived event production rate (events/sec)

2. Assigning nodes to each stratum. Executing at a user set percentage of their capacity, these nodes can process all of the incoming events in their stratum level in parallel under peak event traffic conditions.– Compute ratio of events split between nodes– Iterative calculation starting with the first stratum.

13

Assigning Nodes to Each Stratum

1. Assigning nodes to each stratum. Executing at a user set percentage of their capacity ti, these nodes can process all of the incoming events in their stratum level in parallel under peak event traffic conditions.

– Compute ratio of events split between nodes– Iterative calculation starting with the first stratum.

Formulas

Example: Incoming event rate: 200,000/sec. Ti=0.95. Processing Capacity of node n: 36,000 events/sec.

((ti*ri)/m)i*100 mi*(di/ri)Stratum n Stratum n+1

Stratum n Stratum n+1

Percentage of event stream directed to node ni

Derived event production rate of nodes in stratum n

((0.95*36,000)/200,000)*100 = 17.1%

Percentage of event stream directed to node n

If (di/ri)=0.5, derived event production rate is 200,000*0.5=100,000 events/sec

Thus, 6 nodes will be needed in this stratum

14

Dynamic Load Distribution Strategy

Desirable qualities include:– Dynamic

– Observes Semantics of Agent Dependencies

– Observes Link Latency

– Can perform task splitting

– Observes load, average load, and load variance

– Observes state

15

Overview of Dynamic Load Distribution Algorithm

Event Proxy collects statistics and maintains a time series and makes the following decisions:1. Identify most heavily loaded node in a stratum (donor node).

2. Identify a heavy context to migrate from the donor node.

3. Identify recipient node for migrated load.

4. Estimate post migration utilization of donor and recipient nodes. If post migration utilization of recipient node are unsatisfactory, go back to step 3 and identify new recipient node. If post migration utilization of donor node is unsatisfactory, go back to step 2 and identify new context to migrate.

5. Execute migration and wait for x length time interval. Go to step 1.

Engine Queue AMIT

Engine Queue AMIT

Engine Queue AMIT

Engine Queue AMIT

Engine Queue AMIT

Engine Queue AMIT

EPProxy

Stratum n Stratum n+1

16

Overview of Dynamic Load Distribution Algorithm

Statistics collected by event Proxy:– Number of input events processed by execution of agents in a particular

context– Number of derived events produced by the execution of agents in this

context– Number of different agent executions evaluated in this context– Total amount of latency to evaluate all agents executed in this context

For these statistics, event proxy maintains a time series, and computes statistics such as mean, standard deviation, covariance and correlation coefficient.

These statistics dictate the choice of donor and recipient nodes.

Definition of load is purposely generic to incorporate different application priorities.

17

Post Migration Utilization Calculation

We need to determine whether this migration will lead to overload. If it triggers other migrations then the system will become unstable. Therefore compute the post migration utilization of the donor and recipient machines.

Thus the post migration utilization, Ud, of the donor machine and Ur of the recipient machine after migrating an task t1, and where nd and nr are the total number of tasks on the donor and recipient respectively, is:

)

)(

)(1('

1

1

dni

ii

dd

tL

tLUU )

)(

)(1('

1

1

rni

ii

rr

tL

tLUU

Post migration utilization of donor must be less than preset quality threshold

Post migration utilization of recipient must be less than preset quality threshold

18

Support for Dynamic Changes in EP Graph

Our algorithm supports:– Addition of a new connected sub graph to the existing EPN.

– Addition of an agent to the graph in the EPN.

– Deletion of agents from the graph

– Failure of one or more nodes in a stratum level.

Algorithm is also amenable to agent-level optimizations (E.g. coalescing of neighboring agents).

19

Implementation

Used nodes running IBM Active Middleware Technology (AMiT), a CEP engine that serves as a container for event processing agents.

Event processing scenario: credit card scenario

Node hardware characteristics:

– Type 1: Dual Core AMD Opteron 280 2.4 GHz and 1GB memory.

– Type 2: Intel Pentium D 3.6 Ghz and 2GB memory.

– Type 3: Intel Xeon 2.6 Ghz and 2 GB memory.

20

Goal of Implementation

Explore benefits of event processing on stratified vs. centralized vs. partitioned network (single stratum in which load is distributed according to context).

Explore benefit of stratified approach under heavy load (when the number of incoming events that trigger the generation of derived events increases).

Explore the effectiveness and scalability of the load distribution algorithm

21

Results

398000

90000

300000

81000

162000150000

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

1:1:1 = 3 Machines 2:2:1 = 5 Machines 5:4:1 = 10 Machines

System Type

Eve

nts

/Sec

Stratified Input Rate Partitioned Input RateCentralized = 30,000

Input events processing rate by stratified versus partitioned event processing networks

22

15000

21000

9000

4,500

7500

4500

0

5,000

10,000

15,000

20,000

25,000

1:1:1 = 3 Machines 2:2:1 = 5 Machines 5:4:1 = 10 Machines

System Type

Eve

nts

/Sec

Stratified Derived Rate Partitioned Derived RateCentralized = 1,500

Results

Derived events production rate by stratified versus partitioned event processing networks.

23

Results

Percentage of improvement in performance of the stratified network relative to a partitioned network

40.0032.67

-32.08 -31.75

-40.00

-30.00

-20.00

-10.00

0.00

10.00

20.00

30.00

40.00

50.00

100% - 5:4:1 12.5.% - 5:4:1

Percentage of Events Participating in Derived Events Production

Imp

rove

men

t (P

erce

nta

ge)

Improvement In Event Processing Rate Improvement In Derived Events Rate

24

Results

Average input events processing rate per node in a stratified network with different configurations

44100

49083

52800

3443834813

3737539800 39800

0

10000

20000

30000

40000

50000

60000

100% - 5:4:1 50% - 6:3:1 25% - 8:3:1 12.5.% - 11:3:1

Percenatage of Events Participating in Derived Events Productions

Eve

nts

/Sec

5:4:1Ratio Optimal Ratio for Percentage

25

0

10000

20000

30000

40000

50000

60000

70000

5 10 15 20 25 30

Number of Nodes

Mea

n T

hro

ug

hp

ut

Dynamic Load Distribution (Ours)

Largest Load First

Random

No Load Distribution

Results

Throughput results for the load distribution algorithm

0

50000

100000

150000

200000

250000

300000

0 200 400 600 800 1000 1200

Time (sec)

To

tal T

hro

ug

hp

ut

Dynamic Load Distribution (Ours)

No Load Distribution

Scalability of load distribution algorithm

26

Conclusion and Future Work

Demonstrated stratified, load distribution for scalable event processing

Future Work: Investigate high availability

Future Work: Investigate other objectives in addition to scalability

Future Work: Execution of multiple strata within a single nodes cluster.

Future Work: Techniques for effective load migration between nodes.