ict, strep ferari ict-fp7-619491 flexible event processing for …€¦ · flexible event...

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

ICT, STREP

FERARI ICT-FP7-619491

Flexible Event pRocessing for big dAta

aRchItectures

Collaborative Project

D4.1

Requirements and state of the art overview of flexible event processing

01.02.2013 – 31.01.2014(preparation period)

Contractual Date of Delivery: 31.01.2015

Actual Date of Delivery: 31.01.2015

Author(s): Fabiana Fournier and Inna Skarbovsky

Institution: IBM

Workpackage: Flexible Event Processing

Security: PU

Nature: R

Total number of pages: 48

D4.1 Requirements and state of the art overview on flexible event processing

Project coordinator name Michael Mock Revision: 1

Project coordinator organisation name Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)

Schloss Birlinghoven, 53754 Sankt Augustin, Germany

URL: http://www.iais.fraunhofer.de

Abstract The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way

for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling

business users to express complex analytics tasks through a high-level declarative language that

supports distributed complex event processing as an integral part of the system architecture. Work

package 4 “Flexible Event Processing” deals with all the developments around event processing

technologies in order to achieve this goal.

In order to be flexible, event processing engines need to tackle the two following requirements in a

satisfactory way:

• The easy adaptability to non-functional requirements, specially, the way the tool copes with

scalability issues in a distributed environment.

• The easy definition and maintenance of the event-driven logic.

The task of work package 4 is to provide a model and methodology to cope with these limitations. The

proposed approach addresses both the functional and non-functional properties of event processing

applications by supporting non-technical users with a declarative language expressed in tabular forms.

The outcome model can be then automatically translated into event driven definitions and eventually

into a running application in the proposed FERARI architecture.

http://www.iais.fraunhofer.de/


Revision history Administration Status

Project acronym: FERARI ID: ICT-FP7-619491

Document identifier: D4.1 Requirements and state of the art overview of flexible event processing

(01.02.2013 – 31.01.2014)

Leading Partner: IBM

Report version: 1

Report preparation date: 31.01.2014

Classification: PU

Nature: REPORT

Author(s) and contributors: Fabiana Fournier and Inna Skarbovsky

Status: - Plan

- Draft

- Working

- Final

x Submitted

Copyright This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use within the

consortium and the European Commission.

www.ferari-project.eu

http://www.ferari-project.eu/


Document History Version Date Author Change Description 0.1 0.2

15/11/2014 1/12/2014

Fabiana Fournier (IBM) Fabiana Fournier (IBM)

First draft Second draft including sections 3 and 4

0.3 0.4 0.5

15/12/2014 15/12/2014 15/12/2014

Fabiana Fournier (IBM) Fabiana Fournier (IBM) Fabiana Fournier (IBM)

First complete version Inclusion of abstract Updates per internal review

1.0 30/12/2014 Fabiana Fournier (IBM) Final fixes and cleanup


Table of Contents 1 Introduction .......................................................................................................................................... 1

1.1 Purpose and scope of the document ............................................................................................ 1

1.2 Relationship with other documents ............................................................................................. 1

2 Complex event processing – The motivation ........................................................................................ 1

3 Complex event processing – The business case ................................................................................... 4

4 State of the art in complex event processing tools .............................................................................. 6

4.1 Commercial tools .......................................................................................................................... 8

4.1.1 InfoSphere Streams (IBM) ‎[18]‎[19] ....................................................................................... 9

4.1.2 Informatica Platform for streaming analytics (Informatica), ................................................ 9

4.1.3 Event Stream Processor (ESP) (SAP) ‎[18]‎[19] ........................................................................ 9

4.1.4 Apama (Software AG) ‎[18] ‎[19] ........................................................................................... 10

4.1.5 StreamBase (Tibco) ‎[18] ‎[19] .............................................................................................. 10

4.2 Open source engines ................................................................................................................... 10

4.2.1 Esper (EsperTech Inc) .......................................................................................................... 11

4.2.2 IBM Proactive Technology Online (PROTON)...................................................................... 11

4.2.3 Open source event processing running on distributed stream computing platforms ....... 12

4.3 Research tools ............................................................................................................................. 13

4.4 Limitations of contemporary event processing tools ................................................................. 14

5 Complex event processing background .............................................................................................. 14

5.1 Event types .................................................................................................................................. 15

5.2 Event attributes .......................................................................................................................... 16

5.3 Context ........................................................................................................................................ 16

5.4 Event Processing Network (EPN) ................................................................................................ 17

5.5 Event Processing Agent (EPA) ..................................................................................................... 17

5.6 Pattern policies ........................................................................................................................... 18

5.7 Context initiator policies ............................................................................................................. 19

5.8 PROTON definitions .................................................................................................................... 20

6 Requirements for flexible event processing ....................................................................................... 21


6.1 Non-functional requirements of event processing applications ................................................ 21

6.1.1 Scalability ............................................................................................................................ 22

6.1.2 Availability ........................................................................................................................... 22

6.1.3 Security ............................................................................................................................... 23

6.1.4 Performance objectives ...................................................................................................... 23

6.1.5 Usability............................................................................................................................... 24

6.2 Requirements for the mobile fraud use case ............................................................................. 26

6.2.1 Description of the mobile fraud use case ........................................................................... 27

6.2.1 Event types .......................................................................................................................... 28

6.2.2 Event processing agents ...................................................................................................... 29

6.2.3 Mobile phone fraud use case functional requirements summary ..................................... 35

6.3 Introduction to the event model ................................................................................................ 35

6.4 Summary of the requirements for flexible event processing in FERARI ..................................... 36

7 Summary and future steps .................................................................................................................. 36

8 References .......................................................................................................................................... 38

List of Tables Table 1: Initial EPN for the mobile phone fraud use case ........................................................................... 28


List of Figures Figure 1: Illustration of an event processing network ................................................................................ 17

Figure 2: Event recognition process in an EPA ............................................................................................ 18

Figure 3: Mobile fraud use case initial EPN ................................................................................................ 27

Figure 4: Event recognition process for Filtering EPA ................................................................................. 29

Figure 5: Context for Filter EPA ................................................................................................................... 30

Figure 6: Event recognition process for FrequentLongCallsAtNight EPA ................................................... 30

Figure 7: Context for FrequentLongCallsAtNight EPA ................................................................................. 31

Figure 8: Event recognition process for FrequentLongCalls EPA ................................................................ 32

Figure 9: Context for FrequentLongCalls EPA ............................................................................................. 33

Figure 10: Event recognition process for FrequentEachLongCall EPA ........................................................ 33

Figure 11: Context for FrequentEachLongCall EPA ..................................................................................... 34

Figure 12: Event recognition process for ExpensiveCalls EPA .................................................................... 34

Figure 13: Context for ExpensiveCall EPA ................................................................................................... 35


Acronyms ASF Apache Software Foundation

BAM Business Activity Monitoring

CEP Complex Event Processing

DBMS Data base Management System

DEBS Distributed Event Based

DSCP Distributed Stream Computing Platforms

DSMS Data Stream Processing System

EAI Enterprise Application Integration

ECA Event-Condition-Action

EPA Event Processing Agent

EPL Event Processing Language

EPN Event Processing Network

EPTS Event Processing Technical Society

ESP Event Stream Processing

FERARI Flexible Event pRocessing for big dAta aRchItectures

JSON JavaScript Object Notation

IP Intellectual Property

SaaS Software as a Service

SCADA Supervisory Control And data Acquisition

SIEM Security Information and Event Management

TDM The Decision Model

TEM The Event Model

WP Work Package

1


1 Introduction 1.1 Purpose and scope of the document The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way

for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling

business users to express complex analytics tasks through a high-level declarative language that

supports distributed complex event processing as an integral part of the system architecture. Work

package 4 (WP4) “Flexible Event Processing” deals with all the developments around event processing

technologies in order to achieve this goal.

This report surveys the state of the art in event processing systems including products and research

assets, trends, and limitations of current offerings; and hints towards the way to cope with current

limitations in the scope of the project. The report also describes non-functional and functional

requirements of event processing engines in relation to the mobile fraud use case of the project.

Note that we use complex event processing and event processing, as well as tool, engine and system,

interchangeable throughout this report.

This report is structured as follows: Section ‎2 gives the background for the appearance of complex event

driven systems from the technical point of view, whilst Section ‎3 adds the business incentive. Section ‎4

surveys main commercial, open source, and research event processing tools. Section ‎5 provides some

necessary background on the semantics used in the FERARI project. Section ‎6 describes the

requirements for flexible event processing including details on the mobile fraud use case. We conclude

the report with summary and future steps in Section ‎7.

1.2 Relationship with other documents FERARI stands for Flexible Event pRocessing for big dAta aRchItectures, therefore there is a tight

connection between event processing components and the rest of the components that form the

FERARI architecture, specifically, this deliverable is strongly related to D2.1 - Architecture definition in

WP2. The requirements for the event processing engine are dictated from the use cases in the project,

thus, this report is also strongly related to D1.1 - Application Scenario Description and Requirement

Analysis in WP1.

2 Complex event processing – The motivation

In the past decade, there has been an increase in demand to process continuously flowing data from

external sources at unpredictable rate to obtain timely responses to complex queries. Traditional Data

2


Base Management Systems (DBMSs) require data to be (persistently) stored and indexed before it could

be processed, and process data only when explicitly asked by the users, that is, asynchronously with

respect to its arrival. These requirements led to the development of a number of systems specifically

designed to process information as a flow according to a set of pre-deployed processing rules. Two

models have emerged ‎[6]: the data stream processing model ‎[22] and the complex event processing

model ‎[23].

Data Stream Management Systems (DSMSs) differ from conventional Data Base Management Systems

(DBMSs) in several ways: (a) as opposed to tables, streams are usually unbounded; (b) no assumption

can be made on data arrival order; and (c) size and time constraints make it difficult to store and process

data stream elements after their arrival, and therefore, one time processing is the typical mechanism

used to deal with streams. Users of DSMS install standing (or continuous) queries, i.e., queries that are

deployed once and continue to produce results until removed. Standing queries can be executed

periodically or continuously, as new streams items arrive. As opposed to DBMSs, users in DSMSs do not

have to explicitly ask for updated information; rather the system actively notifies it according to installed

queries. DSMSs focus on producing queries results, which are continuously updated to in accordance to

the constantly changing contents of their input data. Detection and notification of complex patterns of

elements involving sequences and ordering relations are usually out of the scope of DSMSs. DSMSs

mainly focus on flowing data and data transformation, but only a few allow the easy capture of

sequences of data involving complex ordering relationships, not to mention taking into account the

possibility to perform filtering, correlation, and aggregation of data directly in-network, as streams flow

from sources to sinks.

Complex Event Processing (CEP) systems adopt an opposite approach. They associate a precise

semantics to the information items being processed: they are notifications of events which happened in

the external world and were observed by sources, also called event producers. The CEP engine is

responsible for filtering and combining such notifications to understand what is happening in terms of

higher-level events (a.k.a complex events, composite events, or situations) to be notified to sinks, called

event consumers. CEP systems put emphasis on the issue that represents the main limitation of DSMSs,

that is, the ability to detect complex patterns of incoming items, involving sequencing and ordering

relationships. An example for a situation is a Suspicious account which is detected whenever there are

at least three large cash deposits within 10 days for the same account. Event processing is in essence a

paradigm of reactive computing: a system observes the world and reacts to events as they occur. It is an

evolutionary step from the paradigm of responsive computing, in which a system responds only to

explicit service requests. Event processing has evolved in the past years departing from traditional

computing architectures which employ synchronous, request-response interactions between client and

servers to reactive application, in which decisions are driven by events.

3


CEP ‎[20] is a technique in which incoming data about what is happening (event data) is processed more

or less as it arrives to generate higher-level, more-useful, summary information (complex events). Event

processing platforms have built-in capabilities for filtering incoming data, storing windows of event data,

computing aggregates and detecting patterns. In a more formal terminology, CEP software is any

computer program that can generate, read, discard and perform calculations on events. A complex

event is an abstraction of one or more raw or input events. Complex events may signify threats or

opportunities that require a response from the business. One complex event may be the result of

calculations performed on a few or on millions of events from one or more event sources. A situation

may be triggered by the observation of a single raw event, but is more typically obtained by detecting a

pattern over the flow of events. Many of these patterns are temporal in nature ‎[11], but they can also

be spatial, spatio-temporal, or modal ‎[7]. Event processing deals with these functions: get events from

sources (event producers), route these events, filter them, normalize or otherwise transform them,

aggregate them, detect patterns over multiple events, and transfer them as alerts to a human or as a

trigger to an autonomous adaptation system (event consumers). An application or a complete definition

set made up of these functions is also known as an Event Processing Network (EPN).

As aforementioned, the goal of a CEP engine is to notify its users immediately upon the detection of a

pattern of interest. Data flows are seen as streams of events, some of which may be irrelevant for the

user's purposes. Therefore, the main focus is on the efficient filtering out of irrelevant data and

processing of the relevant. Obviously, for such systems to be acceptable, they have to satisfy certain

efficiency, fault tolerance, and accuracy constraints, such as low latency and robustness.

CEP platforms required a new type of architecture. Conventional architectures are not fast or efficient

enough for some applications because they use a "save-and-process" paradigm in which incoming data

is stored in databases in memory or on disk, and then queries are applied. When fast responses are

critical, or the volume of incoming information is very high, application architects instead use a

"process-first" CEP paradigm, in which logic is applied continuously and immediately to the "data in

motion" as it arrives. CEP is more efficient because it computes incrementally, in contrast to

conventional architectures that reprocess large datasets, often repeating the same retrievals and

calculations as each new query is submitted.

CEP has already successfully been applied to several domains: sensor networks for environmental

monitoring [13]; payment analysis for fraud detection ‎[39]; financial applications for trend discovery ‎[40];

RFID-based inventory management for anomaly detection ‎[41]. According to Gartner ‎[46], over a third

of spending on event processing technologies comes from the financial services institution vertical.

More in general, as observed in ‎[23], the information system of every company could and should be

organized around an event-based core that acts as a nervous system to guide and control the other sub-

systems.

CEP has already built up significant momentum manifested in a steady research community and a

variety of commercial as well as open source products ‎[6]. Today, a large variety of commercial and

4


open source event processing tools is available to architects and developers who are building event

processing applications (see Section ‎0). These are sometimes called event processing platforms,

streaming analytics platform, complex-event processing systems, event stream processing (ESP) systems,

or distributed stream computing platforms (DSCPs). For example, Forrester‎[19] defines streaming

analytics platform as: “Software that can filter, aggregate, enrich, and analyze a high throughput of data

from multiple disparate live data sources and in any data format to identify simple and complex patterns

to visualize business in real-time, detect urgent situations, and automate immediate actions”. In their

definition, streaming analytics platforms include both development tools to create streaming

applications and a run-time platform.

However, we distinguish between platforms that can do complex patterns over events and platforms

that only can perform filtering on events and offer the possibility to add the pattern logic. (Complex)

event processing systems are general purpose development and runtime tools that are used by

developers to build custom, event-processing applications without having to re-implement the core

algorithms for handling event streams; as they provide the necessary building blocks to build the event

driven applications. DSCPs, on the other hand, are general-purpose platforms without full native CEP

analytic functions and associated accessories, but they are highly scalable and extensible and usually

offer an open programming model, so developers can add the logic to address many kinds of stream

processing applications, including some CEP solutions. Therefore, they are not considered “real”

complex event processing platforms. As we will see in Section ‎4.2.3 , today there are already some

implementations that take advantage of the pattern recognition capability of CEP systems along with

the scalability capabilities that offer DSCPs, and offer a holistic architecture. FERARI architecture is one

example of this new approach.

3 Complex event processing – The business case

CEP usage is growing rapidly because CEP, in a technical sense, is the only way to get information from

event streams in real-time or near-real time ‎[32]. The system has to process the event data more or less

as it arrives so that the appropriate action can be taken quickly. Note, that we use the term “real time”

loosely, to include “near-real-time” or “business real time.”

Event processing has a marked impact on technical and business aspects of an enterprise ‎[1]. From

technical perspective, CEP enables loose coupling among the components of an enterprise system or

end-to-end process, which makes this system or process highly adaptable while enabling service reuse.

For executives, CEP enables a performance-driven enterprise, allowing immediate, tactical, proactive

decision-making to be driven by a deep knowledge of the context of that decision, while comprehensive,

near-term operational data can inform tactical and strategic decisions.

More specifically, Gartner (‎[2],‎[3],‎[4], and ‎[5]) denotes three business impacts of CEP:

5


Improves the quality of decision making by presenting information that would otherwise be

overlooked.

Enables faster response to threats and opportunities.

Helps shield business people from data overload by eliminating irrelevant information and

presenting only alerts and distilled versions of the most important information. CEP also adds

real-time intelligence to operational technology and business IT applications.

Moreover, the same Gartner reports ‎[2],‎[3],‎[4], and ‎[5]) state that companies should use CEP to enhance

their situation awareness and to build "sense-and respond" behavior into their systems. Situation

awareness means “understanding what is going on, so that you can decide what to do”. According to

these reports, CEP should be used in operational activities that run continuously and need ongoing

monitoring. This can apply to fraud detection, real-time precision marketing (cross-sell and upsell),

factory floor systems, website monitoring, customer contact center management, trading systems for

capital markets, transportation operation management (for airlines, trains, shipping and trucking) and

other applications. In a utility context, CEP can be used to process a combination of supervisory control

and data acquisition (SCADA) events and "last gasp" notifications from smart meters to determine the

location and severity of a network fault, and then to trigger appropriate remedial actions.

In Gartner’s Hype Cycle reports from 2014 (‎[2],‎[3],‎[4], and ‎[5]) CEP remains positioned as

transformational, that means that it enables new ways of doing business across industries that will

result in major shifts in industry dynamics, because “it is the only way to get information from event

streams in real time”. According to these Gartner reports, “CEP will inevitably be adopted in multiple

places within virtually every company. However, companies were initially slow to adopt CEP because it is

so different from conventional architecture, and many developers are still unfamiliar with it. CEP has

moved slightly further past the Peak of Inflated Expectations, but it may take up to 10 more years for it

to reach its potential on the Plateau of Productivity”. According to a recent market guide for event

stream processing ‎[46], “it may take up to 10 years for CEP to reach widespread usage in mainstream

companies. However, it is taken for granted in financial services today. We estimate that it is also in use

in more than 100 smart grid projects and a total of several thousand production deployments worldwide

in a range of industries”. In fact, Forrester ‎[19] estimates a 66% increase in firms’ use of streaming

analytics in the past two years.

As these reports also note, CEP has the potential to influence all sectors: “CEP has already transformed

financial markets… and it is also essential to earthquake detection, radiation hazard screening, smart

electrical grids and real-time location-based marketing”. Furthermore, “CEP is also essential to future

Internet of Things applications where streams of sensor data must be processed in real time”. According

to Gartner, CEP should be used in operational activities that run continuously and need ongoing

monitoring. This can apply to fraud detection, real-time precision marketing (cross-sell and upsell),

factory floor systems, website monitoring, customer contact centre management, trading systems for

6


capital markets, transportation operation management (for airlines, trains, shipping and trucking) and

other applications. Note that our two business cases nicely fall in the list Gartner mentions as natural

and essential candidates for CEP.

Forrester ‎[19] states that “Streaming analytics platforms can help firms detect insights in high velocity

streams of data and act on them on real time”. Moreover, “Business won’t wait. That is truer today than

ever before because of the white-water flow of data from innumerable real-time data sources. Market

data, clickstream, mobile devices, sensors, and even good old fashioned transactions may contain

valuable, but perishable insights. Perishable because the insights are only valuable if you can detect and

act on them right now. That’s where streaming analytics platforms can help”.

In summary, from the business point of view, we can conclude as stated in ‎[32] that “Companies that

understand CEP have more and better real-time intelligence than those that don’t understand it. The

use of CEP will expand further as the pace of business accelerates, more data becomes available in real-

time, and business people demand better situation awareness”. The analysts’ reports presented here

only emphasis this statement. Accordingly the CEP market is forecast to grow rapidly. In fact, it is

forecast to grow to US$4.7bn in 20191.

4 State of the art in complex event processing tools

Today there exist a wide variety of commercial, open source, and research event processing tools.

According to Gartner ‎[2],‎[3],‎[4], and ‎[5]), companies should acquire CEP functionality by using an off-

the-shelf application or SaaS (Software as a Service) offering that has embedded CEP under the covers, if

a product that addresses their particular business requirements is available. Companies should consider

building their own event driven applications in one of the following three cases:

• When an appropriate off-the-shelf application or SaaS offering is not available, companies

should consider building their own CEP-enabled application on an operational intelligence

platform that has embedded CEP capabilities.

• For demanding, high-throughput, low-latency applications — or where the event processing

logic is primary to the business problem — companies should build their own CEP-enabled

applications on commercial or open source CEP platforms (see examples of vendors below) or

DSCPs.

• In rare cases, when none of the other tactics are practical, developers should write custom CEP

logic into their applications using a standard programming language without the use of a

commercial or open source CEP or DSCP product.

1 http://www.companiesandmarkets.com/Market/Information-Technology/Market-Research/Complex-Event-

Processing-CEP-Market-by-Algorithmic-Trading-Global-forecast-to-2019/RPT127618

7


Two forms of stream processing software have emerged in the past 15 years ‎[2],‎[3],‎[4], ‎[5] and ‎[46]) .

The first were CEP platforms that have built-in analytic functions such as filtering, storing windows of

event data, computing aggregates, and detecting patterns. Modern commercial CEP platform products

include adapters to integrate with event sources, development and testing tools, dashboard and alerting

tools, and administration tools. More recently the second form — distributed stream computing

platforms (DSCPs) such as Amazon Web Services Kinesis2 and open source offerings including Apache

Samza3, Spark4 and Storm5 — was developed. As previously mentioned, DSCPs are general-purpose

platforms without full native CEP analytic functions and associated accessories, but they are highly

scalable and extensible so developers can add the logic to address many kinds of stream processing

applications, including some CEP solutions. Specially, Apache open source projects (Storm, Spark, and

recently Samza) have gained a fair amount of attention and interest (‎[46], ‎[21]), and these may well

mature into commercial offerings in future and/or get embedded in existing commercial product sets.

Gartner is now tracking 20 vendors that offer pure-play CEP platforms and six that offer DSCPs ‎[46]

(Note that this is not an exhaustive list):

CEP platforms or tools:

• Codehaus/EsperTech's Esper, NEsper

• Feedzai Pulse

• IBM InfoSphere Streams

• IBM Operational Decision Manager

• Informatica RulePoint

• Fujitsu Interstage Big Data Complex Event Processing

• Hitachi uCosminexus Stream Data Platform

• LG CNS' EventPro

• Microsoft StreamInsight

• OneMarketData OneTick CEP

• Oracle Event Processing

• Red Hat Drools Fusion/JBoss Enterprise BRMS

• SAP Event Stream Processor

• SAS DataFlux

• SQLstream s-Server

• Software AG Apama Event Processing Platform

• Tibco BusinessEvents

• Tibco StreamBase

• Vitria Operational Intelligence Analytic Server

2 http://aws.amazon.com/kinesis/

3 http://samza.incubator.apache.org/

4 http://spark.apache.org/streaming/

5 https://storm.apache.org/

http://aws.amazon.com/kinesis/

http://samza.incubator.apache.org/

http://spark.apache.org/streaming/

https://storm.apache.org/

8


• WS02 CEP Server

DSCPs:

• Google Cloud Dataflow

• Apache S4 (open-source software, originated in Yahoo

• Apache Samza (open source software, maybe from Linked-In)

• Apache Storm (originated in Twitter)

• Server DataTorrent RTS

This document focuses on event processing platforms and we will address DSCPs only in the context of

open source tooling offerings that already comprise the DSCP along with event processing capabilities,

as these are relevant to FERARI. DSCPs are discussed extensively in D2.1 and out of the scope of this

document.

In the following sections we will address the most popular tools in three categories: commercial, open

source, and research.

4.1 Commercial tools Most CEP tools are obtained as part of a larger product. Companies acquire a packaged application or

subscribe to a SaaS service that has embedded CEP under the covers. The company is buying a solution

that happens to require event processing, and it may not realize that CEP is being used. For example,

supply chain visibility products; security information and event management (SIEM) products; some

kinds of fraud detection and governance systems, risk and compliance products; system and network

monitoring systems; business activity monitoring (BAM) tools; and many other categories of software

implement some greater or lesser amount of CEP logic. In a few cases, the developers of these products

or SaaS offerings have leveraged the general purpose event processing platforms listed above to reduce

the amount of code they have to write. But in most cases, the developers implement a specialized

subset of event processing algorithms in new code to suit their application purposes.

Forrester’s evaluation of general-purpose big data streaming analytics platforms from Q3 2014 reveals

five leader vendors in the event processing niche ‎[19]: IBM, Informatica, SAP, Software AG, and Tibco

Software. To assess the state of the big data streaming analytics market and see how the vendors and

their platforms stack up against each other, Forrester evaluated the strengths and weaknesses of the

top commercial big data streaming analytics platform vendors against 50 criteria, grouped into three

high-level buckets: current offering, strategy, and market presence. The leaders have high scores in all

the key evaluation areas: architecture, development tools, and stream processing. Next we will briefly

describe the five leading products.

9


4.1.1 InfoSphere Streams (IBM)6 ‎[18]‎[19]

InfoSphere Streams is a dedicated stream processing system, where the processing of events is

distributed among a dedicated cluster of machines. Depending on the hardware infrastructure and use

case, millions of events can be processed per second. IBM’s InfoSphere Streams supports high volume,

structured and unstructured streaming data sources such as images, audio, voice, VoIP, video, TV,

financial news, radio, police scanners, web traffic, email, chat, GPS data, financial transaction data,

satellite data, sensors, badge swipes, etc.. InfoSphere Streams emerged from IBM Research in 2009 and

continues to benefit from IBM’s significant investments in research. InfoSphere Streams include

customers in healthcare, financial services, telecommunications, government, energy and utilities,

financial services, manufacturing, and transportation. Note that IBM’s InfoSphere Streams can be

classified as either DSCP or CEP, depending on the context and the author’s point of view ‎[20] .

4.1.2 Informatica Platform for streaming analytics (Informatica)7,8

At the core of Informatica’s event detection and response products is RulePoint. RulePoint is a Java-

based software product that acts as an enterprise event service, detecting complex business events as

they occur, and automatically initiating responses as required. RulePoint detects complex events across

disparate information sources including sensors, enterprise application integration (EAI), enterprise

applications, databases, text documents, and more. In 2011, RulePoint was refactored to include

streaming capabilities. They enable developers to author streaming applications using both business

rules and streaming operator constructs built into the platform. Examples of applications include a

geospatial tracking solution to monitor high-risk vessels before they enter ports or as they pass through

shipping areas that are predetermined to be high-risk locations; and a phishing attack management

solution support for banks, credit unions, online brokerages, and e-commerce companies.

4.1.3 Event Stream Processor (ESP) (SAP)9 ‎[18]‎[19]

SAP’s ESP (formerly Sybase Aleri) ‎[18] is a complex event processing system designed for analyzing large

amount of various data in real time. It offers ability to filter, combine and normalize incoming data and

can be used to detect important patterns, changed conditions, security problems and much more. It can

be used to alert when events occurs or to react to events. The tool provides wide range of integrated

tools to improve productivity. With Studio 3, developers can create and manage their applications and

the event processing flow. Wide range of built-in adapters provide interfaces for JDBC, ODBC, JMS, etc.

With the XML based AleriML language data models can be defined, while its SPLASH developer script

language helps to develop much more complex applications what is unable to do with standard

relational programming languages. SAP’s ESP has a broad base of customers in financial services,

telecommunications, manufacturing, energy, retail, transportation and logistics, and public sector.

6 http://www-03.ibm.com/software/products/en/infosphere-streams

7 http://www.informatica.com/us/products/complex-event-processing/#fbid=ghA_Zem5ovE

8 http://www.complexevents.com/wp-content/uploads/2010/10/7107_EventDetectionAndResponse_web.pdf

9 http://www.sybase.com/products/financialservicessolutions/complex-event-processing

10


4.1.4 Apama10 (Software AG) ‎[18] ‎[19]

Apama Event Processing Platform is a complete CEP based tool acquired from Progress Software in 2013.

The CEP engine can handle inbound events within sub-seconds, find defined patterns, alert or respond

to actions. With Apama Event Modeler, developers can create applications via graphical user interface,

which are presentable with Apama Research Studio. Apama Dashboard Studio provides a set of tools to

develop visually rich user interfaces. Via Apama dashboards, users can start/stop, parameterize and

monitor event operations from both client and browser desktops. The Apama package includes many

major adapters to handle communication with other components and applications. Apama has a long

and strong history as a complex event processing platform used for algorithmic trading applications and

market monitoring dating back to its origins in 2001. But, it is also used by telecommunication firms and

credit card companies to provide real-time, location-based, and customer-preference based offers to

consumers. Other industries include retail banking, telecommunications, retail, gaming, logistics and

supply chain, government, energy and utilities, manufacturing.

4.1.5 StreamBase (Tibco)11 ‎[18] ‎[19]

Tibco Software has been a force in the high-frequency trading market for more than fifteen years, and

its acquisition of StreamBase in 2013 has given them the tools they need to meet the needs of the wider

streaming analytics market. StreamBase is a high performance event stream processing platform, which

provides efficient solution to build powerful applications for almost any usage area. It supports fast

development via graphical event-flow language and supports StreamSQL for providing ease of use,

flexibility, and extensibility capabilities for developers. This widely used software gives solutions for

telecommunication, capital markets, intelligence and military, e-commercial, and multiplayer online

gaming areas. In telecommunication, it provides services like network monitoring and protection,

bandwidth and quality-of-service monitoring, fraud detection and location based services and even

more.

4.2 Open source engines Open source is also an option when selecting a CEP engine with developers acquiring a basic open-

source event stream processing engine and then using common, general-purpose programming tools to

build the rest of the application.

In this section we briefly present two open source engines: Esper, today’s the most popular open source

engine, as stated by Gartner “open source CEP products, particularly Esper, have been embedded in

several thousand applications and commercial software products” ‎[46], and PROTON from partner IBM

which is the complex event processing engine in the FERARI project. Other common open source

10

http://www.softwareag.com/corporate/products/apama_webmethods/analytics/products/default.asp 11

http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event-processing

11


engines include: Triceps12 and WSO2 Complex Event Processing Server13 (which uses the Siddhi14 engine

that started as a research project initiated at University of Moratuwa, Sri Lanka)

4.2.1 Esper (EsperTech Inc)15

Esper system ‎[8], ‎[18], which relies on a SQL-based language and Java has already been the target of

previous benchmark studies ‎[9]. Esper is integrated into the Java and .NET languages (NEsper) and can

be used in CEP applications as a library. For ease of understanding, one could conceptualize the Esper

engine as a database turned upside-down. Traditional database systems work by storing incoming data

in disks, according to a predefined relational schema. They can hold an exact history of previous

insertions and updates are usually rare events. User queries are not known beforehand and there are no

strict constraints as far as their latency is concerned. The Esper engine, on the other hand, lets users

define from the very start the queries they are interested in, which act as filters for the streams of

incoming data. Events satisfying the filtering criteria are detected in “real-time" and may be pushed

further down the chain of filters for additional processing or published to their respective

listeners/subscribers ‎[10].

Esper provides a rich set of constructs by which events and event patterns can be expressed. One way to

achieve event representation and handling is through the use of expression-based pattern matching.

Patterns incorporate several operators, some of which may be time-based, and are applied to sequences

of events. A new event matches the pattern expression whenever is satisfies its filtering criteria.

Another method to process events is through the event processing language (EPL) queries which

resemble in their syntax that of the well-known SQL. The most common SQL constructs may also be

used in EPL statements. However, the defined queries are not applied to tables but to views, which can

be understood as basic structures for holding events, according to certain user demands, e.g. the need

for grouping based on certain keys or for applying queries to events up to certain time point in the

past ‎[10].

4.2.2 IBM Proactive Technology Online (PROTON)

In the FERARI project the complex event processing component is built on and extends the IBM

Proactive Technology Online (PROTON) research asset. This asset has become open source16 in the

scope of the FI-WARE FI-PPP project17 (PROTON being the CEP Generic Enabler in the FI-WARE

platform18). Technical documentation regarding PROTON can be found in ‎[24]‎[25], and ‎[26].

PROTON comprises an authoring tool, a run-time engine, producers, and consumers adapters.

Specifically, it includes an integrated run-time platform to develop, deploy, and maintain event-driven

12

http://triceps.sourceforge.net/ 13

http://wso2.com/products/complex-event-processor 14

http://siddhi-cep.blogspot.co.il/ 15

http://www.espertech.com/ 16

Link to the open source: https://github.com/ishkin/Proton 17

http://www.fi-ware.org/ 18

https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/FI-WARE_Architecture

http://triceps.sourceforge.net/

https://github.com/ishkin/Proton

https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/FI-WARE_Architecture

12


applications using a single programming model. The specific architecture of PROTON and its

implementation in the scope of the FERARI project are described in D2.1 – Architecture definition.

4.2.3 Open source event processing running on distributed stream computing platforms

As previously mentioned, many vendor products that claim streaming analytics functionality are actually

frameworks to ingest and route data, but they lack streaming operators and developers must code them

themselves. Henceforth, we survey three recent attempts of integrating event processing open source

tools with DSCP open source platforms.

4.2.3.1 Streaming-cep-engine

Streaming-cep-engine19 is a Complex Event Processing platform built on Spark Streaming. It is the result

of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP

engine as complex event processing engine (Siddhi is the core engine of WSO2 open source tool). It was

first introduced in Spark Summit 201420.

4.2.3.2 Esper on top of Storm

The storm-esper21 library provides a bolt that allows using Esper queries on Storm data streams (for

Storm building blocks refer to D2.1). Storm’s tuples are quite similar to Esper’s map event types. The

tuple field names map naturally to map keys and the field values to values for these keys. The tuple

fields are not typed when they are defined, and considered by Esper as of type Object. In addition, the

fact that tuples have to be defined before a topology is running makes it relatively easy to define the

map event type in the setup phase.

The Esper bolt itself is generic. It receives esper statements and the names of the output fields which

will be generated by these esper statements.

The bolt code itself consists of three pieces. The setup part constructs map event types for each input

stream and registers them with Esper. The second part is the transfer of data from Storm to Esper. The

execute(Tuple tuple) method is called by Storm whenever a tuple from any of the connected streams is

sent to the bolt. The Esper bolt code first has to find the event type name corresponding to the tuple.

Then it iterates over the fields in the tuple and puts the values into a map using the field names as the

keys. Finally, it passes that map to Esper.At this moment, Esper routes this map (the event) through the

statements which in turn might produce new data that it needs to hand back to Storm. For this purpose,

the bolt registered itself as a listener for data emitted from any of the statements that were configured

in Storm during the setup. Esper then calls back the update method on the bolt if one of the statements

generated data. The update method will then basically perform the reverse operation of

the execute method and convert the event data to a tuple.

19

http://stratio.github.io/streaming-cep-engine/ 20

http://spark-summit.org/2014/talk/stratio-streaming-a-new-approach-to-spark-streaming 21

https://github.com/tomdz/storm-esper

13


4.2.3.3 PROTON on top of Storm

Storm is an Apache incubation project at The Apache Software Foundation (ASF), sponsored by the

Apache Incubator. It is utilized by well-known companies with significant volumes of streaming data,

such as The Weather Channel, Spotify, Twitter, and Rocket Fuel (refer to D2.1).

In the scope of the FERARI project, PROTON has been implemented on top of Storm, thus making it a

distributed and scalable CEP engine. For details refer to D2.1 – Architecture definition.

4.3 Research tools There are also many research tools developed in the last decade. They include:

• Amit (IBM Haifa Research Lab) ‎[12]

• Aurora (Brandeis University, Brown University and MIT)22

• Borealis (Brandeis University, Brown University and MIT)23

• Cayuga (Cornell University)24

• ETALIS (Forschungszentrum Informatik Karlsruhe and Stony Brook University) ‎[47]

• NiagaraST (Portland State University)25

• STREAM (Stanford University)26

• Telegraph (UC Berkeley)27

• epZilla (University of Moratuwa)28

We will briefly describe Amit and Etalis as these are the only two ones appearing in the last CEP Tooling

Market Survey 2014 ‎[21].

4.3.1.1 Amit ‎[12]

IBM Research in Haifa has developed a fully functional event processing research asset ‎[12], which is

capable of processing raw event streams from different sources, indentifying specific patterns that are

of interest, and forwarding derived events to subscribers. Amit is no longer supported and has been

replaced by open source PROTON (see ‎4.2.2).

4.3.1.2 Etalis ‎[47]

The ETALIS system provides an expressive logic-based language for specifying and combining complex

events. For this language both a syntax as well as a formal declarative semantics are provided. The

language enables efficient run time event recognition and supports deductive reasoning. Execution

model of the language is based on a compilation strategy into Prolog.

22

http://cs.brown.edu/research/aurora/ 23

http://cs.brown.edu/research/borealis/public/ 24

http://www.cs.cornell.edu/bigreddata/cayuga/ 25

http://datalab.cs.pdx.edu/niagaraST/ 26

http://infolab.stanford.edu/stream/ 27

http://telegraph.cs.berkeley.edu/ 28

http://www.epzilla.org/

14


4.4 Limitations of contemporary event processing tools As has been presented above, there is a large variety of research prototypes as well as commercial

products and platforms. Still, despite the outlooks and the maturity of the tools, CEP tools are not widely

used. In fact, most applications that implement CEP logic don’t use dedicated event processing tools ‎[1].

Some user companies have written custom applications with CEP logic rather than leveraging an off-the-

shelf event processing platform. This was especially common in the 1990s and early 2000s before the

products were widely available, and some developers still choose to write their own CEP logic for

performance or cost reasons. For example, large banks and related financial services companies have

built front-office systems for capital markets trading with their own embedded CEP logic ‎[32]. Gartner

analyst Roy Schulte estimated in July 2012 that around 95% of the event processing applications are

built using ad-hoc programming and do not use existing frameworks ‎[33]. Two of the main

reasons ‎[2] ‎[31] are the difficulty to think in terms of event driven architectures which are asynchronous

in nature, and the relative complexity of existing tools, making them impractical and inaccessible for

business users. In practice, the design of event-driven applications is either done using current

dedicated event processing tools by skilled IT developers that have good familiarity with the event

processing engine and the particular way to bypass the engine’s limitations, or in hand-coded fashion.

As pointed out by Forrester ‎[19] “The streaming application programming model is unfamiliar to most

application developers. It’s a different paradigm from normal programming where code execution

controls data. In streaming applications, the incoming data controls the code”.

In addition, current tools also often lack the ability to process large volumes of distributed (complex)

events which become increasingly important in modern automated business decision processes.

In other words, current event processing tools are not flexible enough, as they require IT expert skills, do

not easily scale, and cannot always run in distributed environments, limiting their usability and widely

spread in the Big Data era.

Before discussing the requirements for flexible event processing systems in details, we next we describe

some basic terms necessary for gaining a common understanding.

5 Complex event processing background Since no widely accepted standard exists for the concepts of event processing, several synonyms appear

in the literature and several attempts have been made in the last years towards homogeneity.

The Event Processing Technical Society (EPTS) is an inclusive group of organizations and individuals

aiming to increase awareness of event processing, foster topics for future standardization, and establish

event processing as a separate academic discipline. The goal of the EPTS is development of shared

understanding of event processing terminology. The society believes that through communicating the

shared understanding developed within the group it would become a catalyst for emergence of effective

interoperation standards, would foster academic research, and creation of training curriculum. In turn, it

15


would lead to establishment of event processing as a discipline in its own right. The EPTS members hope

that through combination of academic research, vendor experience and customer data they will be able

develop a unified glossary, language, and architecture that would homogenize event processing in a

similar way. The society started as an informal group in 2005/2006. It was formally launched as a

consortium in June 2008. Membership of the consortium is based on a formal agreement defining

intellectual property (IP) ownership terms and rules of engagement. The society is governed by a

Steering Committee consisting of founding members of the organization, representatives of major

vendors, and scientists. It is partner of the major scientific event processing conference: Distributed

Event Based Systems (DEBS), the major scientific rules conference: International Web Rule Symposium

(RuleML) and also launched two Dagstuhl seminars on event processing (May 2007 and 2010). It has also

published an event processing glossary ‎[17]. However, the EPTS is almost not-active nowadays.

Also, recent efforts, such as, the Real-time Business Insight Event Processing in Practice and Event

Processing Online Magazine, have stopped their activities.

As a result, each complex event processing engine uses its own terminology and semantics. We follow

the semantics presented in Etzion’s and Niblet’s book ‎[7] and applied in PROTON. We describe below

some main terms used in our work for the sake of clarity.

Henceforth we briefly present main concepts and building blocks in our terminology. For further details

refer to ‎[7].

5.1 Event types Generally speaking, an event is an occurrence within a particular system or domain; it is something that

has happened, or is contemplated as having happened in that domain (‎[7]‎[23]). The word “event” is also

used to mean a programming entity that represents such an occurrence in a computing system. In the

latter definition, an event is an object of an event type. Events are actual instances of the event types

and have specific values. For example, the event "today at 10 PM a customer named John Doe made a

new deposit to his bank account” is an instance of the Transaction event type. An event type specifies

the information that is contained in its event instances by defining a set of attributes. The event

attributes are grouped into the header or metadata (e.g., the occurrence time of the event instance) and

the body or payload (specific information about the event, e.g., customer name).

We relate to the following event types:

A raw event is an event that is introduced into an event processing system by an event producer (an

entity at the edge of an event processing system that introduces events to the system). An example of a

raw event is a Cash deposit into a bank account.

A derived event is an event that is generated as a result of event processing that takes place inside the

event processing system. An example is that a Large cash deposit has been made into a bank account.

16


A situation is a derived event that is emitted outside the event processing system and consumed by at

least one consumer (an entity at the edge of an event processing system that receives events from the

system). An example is a Suspicious bank account.

5.2 Event attributes Every event instance has a set of built-in attributes (metadata) and a set of payload attributes. PROTON

employs the following attributes in the event type's metadata:

Name – of the event type.

OccurenceTime – a timestamp attribute, which we expect the event source to fill in as the

occurrence time of the event. If left empty, this equals the detectionTime attribute value.

DetectionTime – a timestamp attribute that records the time the CEP engine detected the event.

The time is measured in milliseconds, specifying the time difference between the current

machine time at the moment of event detection and midnight, January 1, 1970 UTC.

EventId – a unique string identification of the event, which can be set by the event source to

identify the event instance.

EventSource – holds the source of the event (usually the name of event producer).

The above built-in attributes can be used in an expression in the same manner as user-defined attributes.

User defined attributes can be added to the event type by specifying their names and object types. If the

attribute is an array, its dimension should be specified.

5.3 Context Context is a named specification of conditions that groups event instances so they can be processed in a

related way. While there exist several context dimensions, in this report we employ the two most

commonly used dimensions (in the future we might enlarge the set of context types, depending on the

scenarios requirements): temporal and segmentation-oriented. A temporal context consists of one or

more time intervals, possibly overlapping. Each time interval corresponds to a context partition, which

contains events that occur during that interval. A segmentation-oriented context is used to group event

instances into context partitions based on the value of an attribute or collection of attributes in the

instances themselves. As a simple example, consider a single stream of input events, in which each

event contains a customer identifier attribute. The value of this attribute can be used to group events so

there is a separate context partition for each customer. Each context partition contains only events

related to that customer, so the behaviour of each customer can be tracked independently of the other

customers. A composite context is a context that is composed from two or more contexts, known as its

members. The set of context partitions for the composite context is the Cartesian product of the

partition sets of the member contexts

17


5.4 Event Processing Network (EPN) An Event Processing Network (EPN) is a conceptual model, describing the event processing flow

execution. An EPN comprises a collection of Event processing Agents (EPAs), event producers, events

and consumers Figure 1. The network describes the flow of events originating at event producers and

flowing through various event processing agents to eventually reach event consumers. For example, in

Figure 1, events from Producer 1 are processed by Agent 1. Events derived by Agent 1 are of interest to

Consumer 1 but are also processed by Agent 3 together with events derived from Agent 2. Note that the

intermediary processing between producers and consumers in every installation is made up of several

functions and often the same function is applied to different events for different purposes at different

stages of the processing.

Figure 1: Illustration of an event processing network

The application definitions, i.e. the EPN, are written by the application developer during the build-time.

In PROTON, the definitions output in JSON (JavaScript Object Notation) format, is provided as

configuration to the CEP run-time engine.

5.5 Event Processing Agent (EPA) An Event Processing Agent (EPA) is a component that, given a set of input/incoming events within a

context, applies some logic for generating a set of output/derived events. An EPA can apply different

event patterns to detect specific relations among the input events.

An EPA performs three logical steps, a.k.a pattern matching process or event recognition (see Figure 2).

Please note that all three steps are optional but at least one must be done inside an EPA.

The filtering step, in which relevant events from the input events are selected for processing

according to the filter conditions. The output of this step is a set of participant events.

The matching step that takes all events that passed the filtering and looks for matches between

these events, using an event processing pattern or some other kind of matching criterion. The

output of this step is the matching set.

The derivation step that takes the output from the matching step and uses it to derive the

output events by applying derivation formulae.

Event Producer 1

Event Producer 2

Event Consumer 1

Event Consumer 2

EPA 1

EPA 3EPA 2

18


Figure 2: Event recognition process in an EPA

An event pattern is a template specifying one or more combinations of events. Given any collection of

events, if it’s possible to find one or more subsets of those events that match a particular pattern, it can

be said that such a subset satisfies the pattern. Some common examples of patterns are:

Filter, means that each event is evaluated against an expression and the event is filtered-in only

if it meets the expression conditions, and otherwise is filtered-out.

Sequence, means that at least one instance of all participating event types must arrive in a

specified order for the pattern to be matched.

Count, means that the number of instances in the participant event set satisfies the pattern’s

number assertion.

All, means that at least one instance of all participating event types must arrive for the pattern

to be matched; the arrival order in this case is immaterial.

Trend, events need to satisfy a specific change (increasing or decreasing) over time of some

observed value; this refers to the value of a specific attribute or attributes.

Absence, a specified event(s) must not occur within a predefined time window. The matching

set in this case is empty.

SUM, means that the value of a specific attribute, summed up over all participant events,

satisfies the sum threshold assertion.

5.6 Pattern policies A pattern policy is a named parameter that disambiguates the semantics of the pattern and the pattern

matching process. Pattern policies fine-tune the way the pattern detection process works. PROTON

supports five types of policies:

Event Processing Agent

Incoming/input events

Derived/output events

within context

filtering

matching

deriving

participant events

matching set

19


Evaluation policy – when the matching sets are produced? The EPA can either generate output

incrementally (in this case the evaluation policy is called Immediate) or at the end of the temporal

context (called Deferred).

Cardinality policy – how many matching sets are produced within a single context partition? Cardinality

policy helps limiting the number of matching sets generated, and thus the number of derived events

produced. The policy type can be single, meaning only one matching set is generated; or unrestricted,

meaning there are no restrictions on the number of matching sets generated.

Repeated/Instance Selection type policy – what happens if the matching step encounters multiple

events of the same type? The override repeated policy means that whenever a new event instance is

encountered and the participant set already contains the required number of instances of that type, the

new instance replaces the oldest previous instance of that type. The every repeated policy means that

every instance is kept, meaning all possible matching sets can be produced. First means that every

instance is kept, but only the earliest instance of each type is used for matching. Last is the same as first,

but the latest instance of each type is used for matching.

Consumption policy – what happens to a particular event after it has been included in the matching set?

Possible consumption policies are consume, meaning each event instance can be used in only one

matching set; and reuse, meaning an event instance can participate in an unrestricted number of

matching sets.

Policy relevance can be dictated by the event pattern. For example, the evaluation policy for an absence

pattern is always deferred (as we are testing the existence of an event instance for a specified temporal

context). Also, not all possible policies combinations are meaningful. For example, the choice of

consumption policy is irrelevant if the cardinality policy is single, because that means that the matching

step runs only once.

5.7 Context initiator policies A temporal context starts with an initiator and ends with a terminator. An initiator can be an event,

system startup, or absolute time. A terminator ends the temporal context. The terminator can be an

event, relative expiration time, an absolute expiration time, or “never ends”, i.e. the temporal context

remains open until engine shutdown.

A context initiator policy tunes up the semantics for temporal contexts in which the context initiator is

determined by an event. A context initiator policy defines the behaviours required when a window has

been opened and a subsequent initiator event is detected. The options are: add, a new window is

opened alongside the existing one; or ignore, the original window is preserved.

20


5.8 PROTON definitions In PROTON, the JSON CEP application definitions file can be created in three ways:

1. Build-time user interface – By this, the application developer creates the building blocks of the

application definitions. This is done by filling up forms without the need to write any code. The

file that is generated is exported in a JSON format to the CEP run-time engine.

2. Programming – The JSON definitions file can alternatively be generated programmatically by an

external application and fed into the CEP run-time engine.

3. Manually – The JSON file is created manually and fed into the CEP run-time engine.

The created JSON file comprises the following definitions:

Event types – the events that are expected to be received as input or to be generated as derived

events. An event type definition includes the event name and a list of its attributes.

Producers – the event sources and the way PROTON gets events from those sources.

Consumers – the event consumers and the way they get derived events from PROTON.

Temporal contexts – time window contexts in which event processing agents are active.

Segmentation contexts – semantic contexts that are used to group several events to be used by

the EPAs.

Composite contexts – grouping together several different contexts.

Event processing agents – patterns of incoming events in specific context that detect situations

and generate derived events. An EPA includes most of the following general characteristics:

o Unique name

o EPA type (operator). For each operator, different sets of properties and operands are

applicable.

o Context

o Other properties such as condition

o Participating events

o Segmentation contexts

o Derived events

The JSON file that is created at build-time contains all EPN definitions, including definitions for event

types, EPAs, contexts, producers, and consumers. At execution, the standalone run-time engine accesses

the metadata file, loads and parses all the definitions, creates a thread per each input and output

21


adapter and starts listening for events incoming from the input adapters (representing producers) and

forwards events to output adapters (representing consumers).

For the distributed implementation on top of STORM, an input Bolt serves the same function as input

adapter, and the derived events are passed as STORM tuples farther up in the chain of processing in

STORM (for the full integration details refer to D2.1).

6 Requirements for flexible event processing

In essence, in order for an event processing system to be flexible it has to fulfill two main requirements:

it can easily adapt to distributed scalable architectures and is easy enough so that not-IT experts can

define the event logic of an application. With regards to CEP, the FERARI project exactly addresses these

gaps.

FERARI envisaged architecture provides a distributed scalable platform in which PROTON is already

implemented. Refer to D2.1 – Architecture definition for details on the FERAI architecture. WP4 mainly

addresses the second requirement.

In this section we will briefly describe the main non-functional requirements of event processing

systems followed by a first cut of the mobile fraud use case event processing application design. We also

introduce The Event Model (TEM). In the summary of this section we address how we will tackle the

flexibility issue in the project.

6.1 Non-functional requirements of event processing applications The design of event processing applications consists of the design of the functional properties as well as

the nonfunctional properties. Non-functional requirements are concerned not with what a system does

but how well. It is often the non-functional properties that make or break a specific application ‎[7]. In

the following subsections we briefly describe main aspects of non-functional requirements of event

driven systems. A survey of the state of the art in the area of non-functional requirements can be found

in ‎[13].

The design of both functional and non-functional requirements is implementation specific and is either

done using current dedicated event processing tools by skilled IT developers that have good familiarity

with the event processing engine and the particular way to bypass the engine’s limitations, or in hand

coded fashion. As aforementioned, in both cases, it is rather complex and the actual design is not

accessible to the business users. With regards to non-functional requirements, the tuning is done

according to the capabilities of the tool, and often it is not possible to optimize for multiple goals, such

as trade-off between throughput and latency (see ‎[14] for such optimization methods).

22


6.1.1 Scalability

Scalability is the capability of a system to adapt readily to a greater or lesser intensity of use, volume, or

demand while still meeting its business objectives. Scalability has several dimensions. The dimensions

relevant to event processing are the number of producers and consumers, number of input events,

number of event processing agent types, processing complexity, number of derived events, number of

concurrent runtime instances, number of concurrent runtime contexts ‎[13] and ‎[7].

In the event processing world there are two common approaches to scalability: scaling out and scaling

up. In the case of scaling out, or in other words, “horizontal scalability”, the approach is to add

additional logical units or nodes to increase processing power, while on the surface making them work

as a single unit. Example of such approaches is clustering of processing nodes and load balancing of

incoming stream of data between nodes. Scaling up, or “vertical scalability” means adding resources

within the same logical unit (node) to increase processing capacity, example of such is adding memory

to a physical node.

Not all applications can be scaled using the above techniques, but rather they need to satisfy some

constraints in order to be candidates for scale-up and scale-out, such as applications that can support

partitioning of state and load balancing.

Both approaches have tradeoffs. Scale-up approach has a simple management model and no network

communication overhead, however its growth potential is finite and there is no redundancy. On the

other hand, in scale-out approach we gain performance, redundancy, fault tolerance, but the cost of

such is increased management complexity, complex programming model, and communication

overheads between nodes which need to be taken into account.

Event processing applications uses load-shedding and load-balancing approaches to ensure the desired

performance using the limited resources provided to the application. For each application the options

should be examined carefully to determine what the appropriate solution is.

6.1.2 Availability

The availability of a system is the percentage of the time its users perceive it to be functioning. Event

processing systems can use existing standard high availability practices like logging, failover, and disaster

recovery practices. The designer of an event processing system must, however, make decisions related

to high availability. These considerations relate to whether it is cost effective to employ high availability

practices, as they have a cost associated with them and they may not be fully required in some

applications. An example of such a consideration is the issue of recoverability, that is, is the ability to

restore the state of a system to its exact value before a failure occurred.

Some event processing agents (such as those that perform aggregation, composition, and pattern

detection) are stateful, that is, the internal state of such an agent has to be kept as long as the particular

EPA instance is active, meaning as long as its context partition is valid. For example, a sequence pattern

detect EPA running with the reuse policy over a 24-hour window might need to retain all the participant

23


events that occurred during that period. In some applications recoverability is a must. If the event

processing is part of a mission-critical application, and decisions are made using the results of this

processing, losing some of the system’s state may have critical implications.

However there are also event processing applications where high availability is not required,

applications where events are symptoms of some underlying problem, which will occur again even if an

event is lost, or systems looking for statistical trends based on sampling. In such applications the cost of

applying high availability solution might well be too high based on the benefits which can be ripped from

such a solution.

6.1.3 Security

Security requirements relate both to ensuring that operations are only performed by authorized parties,

and that privacy considerations are met. Specifically this means the following functions:

Ensuring only authorized parties are allowed to be event producers or event consumers.

Ensuring that incoming events are filtered so that authorized producers can’t introduce invalid

events or events that they are not entitled to publish.

Ensuring that consumers only receive information to which they are entitled. In some cases a

consumer might be entitled to see some of the attributes of an event but not others.

Ensuring that unauthorized parties can’t add new event processing agents to the system, or

make modifications to the EPN itself (in systems where dynamic EPN modification is supported).

Keeping auditable logs of events received and processed, or other activities performed by the

system.

Ensuring that all databases and data communications links used by the system are secure.

6.1.4 Performance objectives

Some non-functional requirements can be translated to performance objectives which can then be the

subject of various optimization approaches. Some of the major performance objectives for event

processing are related to throughput, latency, and time-constraint objectives.

All these objectives are intended to address scaling issues, but each addresses them using different

assumptions and may be served by different optimizations. In addition, each objective may apply to an

entire system, or to any part of a system. In some systems there is a single performance objective for all

the processing in the system, for example, latency leveling for each event type in that system. In other

systems, there may be mix of performance objectives; some of the events may have real-time

constraints associated with them, whereas others may have another metric. Performance objectives

may also be composed of several separate metrics.

One of the major ways to achieve various performance metrics is parallel processing. There are three

levels of parallelism: first, parallelism inside a single core using multithreading; second, parallelism by

partitioning the work within a multicore machine where the threads running in different cores have

access to shared memory; and third, partitioning the work to multiple machines within a cluster.

24


An additional optimization method involves moving the processing close to the producers and

consumers where applicable. Consider an example where there are multiple sensors within the same

location, and the event processing involves aggregation of events that are emitted by these sensors.

Placing the aggregation EPA close to the sensors can eliminate a substantial amount of network traffic.

Likewise, if the EPN contains an EPA that creates many events that are all consumed by a certain

consumer, or a set of consumers that are located in a certain location, it might be useful to locate this

EPA close to the consumer or consumers. This optimization approach can also complement the parallel

processing approach. If the parallel event processing is executed over a grid of machines within various

geographic locations (instead of being on a physical cluster or co-located set of multicore machines) it

might be sensible to co-locate a group of agents if there’s a substantial amount of communication

between them.

In the research community several attempts were made to optimize the distribution and schedule of

event processing networks. In ‎[15], a stratification algorithm is used to reveal dependencies among

functions in an event processing network and co-locate independent functions in layers or strata. This

allows for horizontal partitioning. The work in ‎[15] then elaborates on a profiling-based technique for

event processing agent placements on execution nodes allowing for vertical partitioning. For example, if

a sequential pattern is segmented by an identifier as payload of the events, the execution could be

vertically partitioned by that identifier.

6.1.5 Usability

As already mentioned, there are no standards for event processing programming languages, although

there are various programming styles and approaches. In this section we look at two styles: the stream-

oriented style and the rule-oriented style ‎[34], ‎[7].

6.1.5.1 Stream-oriented programming style

The stream-oriented programming style is rooted in data flow programming. In essence a data flow

graph is a directed graph that consists of nodes and edges. The nodes represent processing elements,

and the edges represent data flowing between these nodes. The paradigm is one of continuous queries,

sometimes called operators that are constantly running in the nodes, while their results flow through

the edges in the data flow graph. The languages used to describe the queries are inspired by SQL and

relational algebra, though not all of them are based on SQL. When we are using a data flow graph for

event processing, the data flowing in the streams are event instances and have the appropriate event

semantics. These event instances are represented as records, and are often referred to as tuples

following the relational model’s terminology. A stream is a continuous flow of events, in most cases all

of the same event type, and are considered to be tuples of the same relation. The stream may be

unbounded and be active forever. This means that, unlike the conventional relational model where a

query is executed against an entire table of data, in the continuous query model a query can execute

only against a bounded subset of the stream. The stream is therefore broken up into a sequence of

windows and the query is performed successively against each window. This style is very common in

25


existing tools, e.g., InfoSphere Streams, Tibco Streambase, SAP ESP, Esper, Oracle Event processing, and

Microsoft Stream Insight.

6.1.5.2 Rule-oriented languages

The other dominant style of event processing languages is the style called rule-oriented. There are

several distinct types of rules: production rules, active (event-condition- action) rules, and rules based

on logic programming. We briefly present each of these styles below.

6.1.5.2.1 Production rules

Production rules are rules of the type “if –condition- then action”. They operate in a forward chaining

way: when the condition is satisfied, the action is performed. Production rules are rooted in expert

systems; the operational processing of production rules may be either declarative or procedural:

Declarative production rule execution is typically based on a variation of the Rete ‎[35] algorithm

which matches facts against the patterns contained in the rules to determine which rule

conditions are satisfied. Information about the antecedents (conditions) of each rule is stored in

an internal state, and in every execution cycle changes to these states are evaluated.

Procedural production rule execution is based on sequential execution of compiled rules.

Production rules are based on state changes and not on events; however, some event processing

languages extend Rete-based production rules to support event processing. This is done by making

events an explicit part of the model, so that event occurrences can be used as part of the conditions for

invoking an inference rule. Thus the event processing is done through an inference process.

6.1.5.2.2 Active rules

Active rules, also known as event-condition-action (ECA) rules, are descended from work on active

databases. Active rules operate according to the following execution pattern: when an event occurs,

evaluate conditions and, if they are satisfied, trigger an action. The event may be primitive or composite.

The action can be one that derives an additional event, in which case an active rule maps directly onto

an EPA in our model. In cases where the action performs some external activity, such as invoking an

external service, the rule maps to the combination of an EPA and an event consumer. Example of tools

that apply the ECA style: Apama and RulePoint.

6.1.5.2.3 Logic programming rules

Logic programming is a programming style based on logical assertions. The most well-known example of

a logic programming language is Prolog. The application of the logic programming style to event

processing is rooted in the work done in the deductive database area. Commercial tools seldom apply

this kind of style, but still can be found in Tibco Business Event. Research projects are more common,

and can be found in the following languages: Etalis ‎[42]; r-tec ‎[37] and ‎[43]‎[43]; SAGE ‎[38]; and t-rex ‎[44].

26


6.1.5.3 Build-time interfaces

Event processing tools are composed of design (or build)-time and run-time components. The design

component serves for the definition of the event-driven application, while the run-time is the engine

that according to the event definitions, processes the events in real-time in order to detect and derive

the desired situations. We can identify four types of build-time interfaces ‎[13]:

Text based programming languages (e.g., Apama)

Visual languages (e.g., StreamBase)

Form based languages (e.g., PROTON)

Natural languages (e.g., ODM Advanced29)

These types are not mutually exclusive, as development environments can consist of a mixture of

graphical and text oriented tools. The various environments reflect different assumptions about

developers’ preferences. In some cases developers prefer a more familiar text-based interface, whereas

others prefer a more visual style of development.

The task of defining the event definitions can be tedious and a hard task even for experts. In order to

alleviate this task, in some engines, the event definitions can be learnt in an automated way using

machine learning techniques. However, this aspect has received little attention so far. Some research

work on machine learning techniques to define the event patterns can be found in ‎[36] and ‎[37].

Most of the existing CEP engines have limitation on addition or modification of rules. Rules are

configured initially once and are not expected to change later. In other words, once the rules are

defined and configured the system freezes and rules cannot be added dynamically at run-time. However,

rules might change over time due to the dynamic nature of the application. In Esper ‎[8] on Demand

Query/Rule facility provides ad-hoc execution of an EPL expression, but it has some limitations. Drools30

uses a polling mechanism to support dynamic rules/queries at runtime. However, this approach is not

very efficient as the system is not notified whenever there is a need to update the rule base; instead it

polls the resources again and again. The proposed research tool in ‎[45] applies a push based or event

driven approach for incorporating the dynamism in CEP engines. It has been implemented to extend

Drools CEP engine. Note that in the scope of FERARI we intend to extend PROTON to cover some

functionality with regards to dynamic updates.

6.2 Requirements for the mobile fraud use case The use of the system will be shown in two application scenarios from telecommunication, where end

users will test the architecture for the two scenarios of mobile phone fraud detection and for cloud

health monitoring. Right now we focus only on mobile fraud.

29

https://www-01.ibm.com/support/knowledgecenter/SSQP76_8.7.0/com.ibm.odm.itoa.overview/topics/odm_itoa_overview.html?lang=en-us 30

http://docs.jboss.org/drools/release/6.2.0.CR3/drools-docs/pdf/drools-docs.pdf

27


6.2.1 Description of the mobile fraud use case

The overarching aim of the CEP component in this use case is to detect a potential mobile fraud incident.

To this end, a first EPN has been created with the collaboration of the use case owner with the goal of

having something meaningful and representative, yet doable to be achieved in the first year of the

project. The outcome is an EPN consisting of five EPAs shown in Figure 3 and detailed in the following

sections. For the sake of simplicity we only show the EPAs and the events flow in the network. The

PROTON JSON definitions file that comprises this EPN is currently being implemented.

In the current EPN we want to fire situations in the following cases (for detailed descriptions of each EPA

see Sections ‎6.2.2.1-‎6.2.2.5):

A long call to premium distance is made during night hours (EPA2, LongCallAtNight).

As before, but this time we are looking for at least three of these “long distance calls” per calling

number (EPA2, FrequentLongCallsAtNight).

Multiple long distance calls per calling number that cost more than a certain threshold value

(EPA3, FrequentLongCalls).

Same as before, but each occurrence cost exceeds the threshold (EPA4, FrequentEachLongCall)

We are looking for high usage of a line for long distance calls (EPA5, Expensivecall).

In the current process, potential fraud situations are (automatically) marked and inspected

afterwards by a human operator who decides whether it is a fraud or not. Therefore, the situations

described above and depicted in Figure 3 will be marked as potential indications of fraud incidents,

and will be checked up by humans afterwards.

Figure 3: Mobile fraud use case initial EPN

Note the following:

FrequentLongCallsAtNightEPA2

Cal

ls

EPA1

ExpensiveCalls

LongCallAtNight

Situ

atio

ns

EPA3FrequentLongCalls

EPA5

EPA4FrequentEachLongCall

28


Due to privacy issues, the values chosen for specific variables and thresholds selected are not

the correct ones. In reality, the EPN will be implemented applying the correct values. However,

this does not alter the logic of the rules, just the assignment of the different variables and

thresholds values.

“Premium location services” is a closed list of potential far locations/destinations for which the

rules are relevant. We have opted for “Maldives” as a code name for these locations. In practice,

the same pattern will be duplicated for each of the locations in this list.

In this use case night hours are considered between 19:00 and 7:00, and 24 hours are

considered from 24:00 to 23:59 the day after.

We are only are interested in outgoing calls (incoming calls are not relevant to fraud detection),

indicated whenever the call_direction field equals 1 (refer to Table 1).

6.2.1 Event types

Five event types have been defined so far that comprise the event inputs, outputs/derived, and

situations as shown in Table 1. For the sake of simplicity we only show the user-defined attributes or the

event payload and not the metadata (Section ‎5.2).

Although the names of concepts in the application can be determined freely by the application designer

in PROTON, we use some naming conventions for the sake of clarity. We denote event types with capital

letters. Built-in/metadata attributes start with a capital letter, as well as payload attributes that hold

operators values, while payload attributes start with a lower letter.

Note that the Call raw event includes more fields or attributes. We defined only the ones required for

pattern detection in the current EPN implementation. When running in FERARI architecture, PROTON

will ignore event attributes not specified in its JSON.

Table 1: Initial EPN for the mobile phone fraud use case

Event name Call

Payload object_id; billed_msisdn; call_start_date; calling_number; called_number; other_party_tel_number; call_direction; tap_related; conversation_duration; total_call_charge_amount

Event name LongCallAtNight

Payload calling_number; conversation_duration; other_party_tel_number

Event name FrequentLongCallsAtNight

Payload calling_number; other_party_tel_number; CallsCount

Event name FrequentLongCalls

Payload calling_number; other_party_tel_number; CallsCount; CallsLengthSum

Event name FrequentEachLongCall

Payload calling_number other_party_tel_number; CallsCount

Event name ExpensiveCalls

Payload calling_number; other_party_tel_number; CallsCostSum

29


6.2.2 Event processing agents

Henceforth, we describe the EPAs in the following order: Event name; motivation; event recognition

process (following Figure 2); contexts along with temporal context policy; and pattern policies.

In the event recognition process we only show the steps that take place, i.e. relevant, in the specific EPA,

while the others are greyed. For the filtering step we show the filtering expression; for the matching

step we denote the pattern variables; and for the derivation step we denote the values assignment and

calculations. Please note that for the sake of simplicity we only show the assignments that are not copy

of values (all other derived event attributes values are copied from the input events). For attributes, we

just denote their names without the prefix of ‘attribute_name.’

6.2.2.1 EPA1: LongCallAtNight

Motivation: Check for “long” calls (defined as more than 40 min) to premium locations during night

hours (limited from 19:00 to 7:00).

Event recognition process

Figure 4: Event recognition process for Filtering EPA

Note that Filter agents are used to eliminate uninteresting events. A Filter agent takes an incoming

event object and applies a test to decide whether to discard it or whether to pass it on for processing by

subsequent agents. The Filter agent test is therefore stateless, in other words, a test based solely on the

content of the event instance. Therefore, both pattern and context policies are not applicable with this

type of EPA.

Pattern policies

Evaluation Cardinality Repeated Consumption

IMMEDIATE UNRESTRICTED FIRST REUSE

other_party_tel_number = “Maildives” ANDcall_direction = 1 AND(call_start_date > 19:00 OR call_start_date < 7:00) ANDconversation_duration > 40 minutes


Call

within context

filtering

deriving

LongCallAtNightmatching

30


Context

Segmentation: Not applicable.

Temporal window: ALWAYS

Initiator policy: IGNORE

Meaning: The temporal window will open with the first Call and will not close.

Figure 5: Context for Filter EPA

6.2.2.2 EPA2: FrequentLongCallsAtNight

Motivation: Same as before, but we are seeking for at least 3 calls made to premium locations during

night hours lasting longer than “40 minutes” per a calling number.


Figure 6: Event recognition process for FrequentLongCallsAtNight EPA

Note that the pattern COUNT sums the number of the input event occurrences, while count is the

assertion value for the COUNT pattern. Also, that the input event for this EPA is LongCallAtNight event

which is derived from EPA1 (see Figure 3).

Call


LongCallAtNight

within context

filtering

deriving

FrequentLongCallsAtNightCOUNT

count> 2

CallsCount : count

31


Pattern policies


IMMEDIATE UNRESTRICTED FIRST REUSE

Context

Segmentation: by calling_number

Temporal window: DAILY (fixed non-overlapping interval)

Initiator: 24:00

Terminator: 23:59


Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we

group calls made during one day. The filter step will assure that only calls made at night will be

considered in the counting. In Figure 7, the forth call does not pass the filter assertion, and therefore

there is no a derived event at this point (as per the policies used, we start firing derived events at each

time the pattern is satisfied).

Figure 7: Context for FrequentLongCallsAtNight EPA

6.2.2.3 EPA3: FrequentLongCalls

Motivation: We are interested in detecting a situation resulting from at least 10 calls made to a

premium location summing up at least 60 min length in a day.


Call

23 : 59 24 : 00

FrequentLongCallsAtNight FrequentLongCallsAtNight

32


Figure 8: Event recognition process for FrequentLongCalls EPA

Note that the pattern SUM has two assertions, namely count (the number of occurrences to be satisfied)

and countSum (the value to be exceeded).

Pattern policies


IMMEDIATE SINGLE FIRST REUSE

Context


Temporal window: DAILY (fixed non-overlapping windows)

Initiator: 24:00

Terminator: 23:59



group calls made during one day. In Figure 9, only one derived events will be fired as the pattern is

satisfied on the 10th Call. Note that it might be that the pattern is also satisfied in following calls as well,

but according to the policy we only notify once as the pattern is detected.

(count>9 and countSum(conversation_duration) > 60 min)


Call

within context

filtering

SUM

deriving

FrequentLongCalls

other_party_tel_number = “Maildives” ANDcall_direction = 1

CallsCount: countCallsLengthSum: countSum

33


Figure 9: Context for FrequentLongCalls EPA

6.2.2.4 EPA4: FrequentEachLongCall

Motivation: A variation of the previous pattern. In this case, we are interested in detecting a situation

resulting from at least 10 long (last at least 60 min each) calls made to a premium location in a day.


Figure 10: Event recognition process for FrequentEachLongCall EPA

Pattern policies



Context


Temporal window: DAILY (non-overlapping windows)

Initiator: 24:00

Terminator: 23:59

count>9


Call

within context

filtering

COUNT

deriving

FrequentEachLongCall

other_party_tel_number = “Maildives” ANDcall_direction = 1 ANDconversation_duration > 60 minutes

CallsCount : count

Call

23 : 59 24 : 00

FrequentLongCalls

34




group calls made during one day. In Figure 11, one derived event will be fired as the pattern is satisfied

on the 12th call.

Figure 11: Context for FrequentEachLongCall EPA

6.2.2.5 EPA5: ExpensiveCalls

Motivation: For every six hours, we notify in case calls dialed to premium locations sum up more than a

pre-defined cost (e.g. 100kn) per calling number.


Figure 12: Event recognition process for ExpensiveCalls EPA

Pattern policies



Context



Call

within context

filtering

SUM

deriving

ExpensiveCalls

other_party_tel_number = “Maildives” ANDcall_direction = 1 AND

countSum(total_call_charge_amount) > 100 kn

CallsCostSum: countSum

Call

23 : 59 24 : 00

FrequentEachLongCalls

35


Temporal window: sliding/overlapping window

Initiator: first Call

Terminator: + 6 hours

Initiator policy: ADD

Meaning: The first window opens with the first Call event. This window closes after 6 hours. The second

event opens, again, a six hour window, and so forth. Figure 13 shows the different windows that appear

in different colors. As it can been seem, each event might correspond to more than window. The derived

event is emitted only once (the cardinality policy is SINGLE) when the pattern is detected (the evaluation

policy is IMMEDIATE).

Figure 13: Context for ExpensiveCall EPA

6.2.3 Mobile phone fraud use case functional requirements summary

The first EPN for the fraud detection use case (see Figure 3) includes five EPAs (types: FILTER, COUNT,

and SUM), one raw event, and five situations. Our design and implementation relies on PROTON’s

building blocks and capabilities and it might be possible that the same application will look differently

when implemented in another CEP engine that uses different building blocks. The implementation of

this EPN is currently work-in-process in FERARI’s architecture uses real-data that has been anonymized

due to privacy issues. Further refinements of this initial EPN will include more event rules.

6.3 Introduction to the event model The Event Model (TEM) provides a new way to model, develop, validate, maintain, and implement

event-driven applications. In TEM, the event derivation logic is expressed through a high-level

declarative language through a collection of normalized tables (spreadsheet like fashion). These tables

can be automatically validated and transformed into an EPN and eventually to a running application.

This idea has already been successfully proven in the domain of business rules by The Decision Model

(TDM) ‎[16] . TDM groups the rules into natural logical groups to create a structure that makes the model

relatively simple to understand, communicate, and manage. TEM is based on a set of well-defined

principles and building blocks, and does not require substantial programming skills, therefore target to

non-technical people.

Call

+6 hours

ExpensiveCalls ExpensiveCalls

+6 hours +6 hours +6 hours

36


Current version of the model (‎[27]‎[28], ‎[29], ‎[30]) covers part of the functional requirements of event-

driven applications. In the scope of the FERARI project we plan to extend today’s basic model to cover all

aspects of functional requirements as well as non-functional requirements which is still a missing piece

in the model. The resulting tables will be converted into an EPN which can thereafter be converted into

a JSON definition file and run in PROTON.

6.4 Summary of the requirements for flexible event processing in FERARI In this section we surveyed the main aspects of non-functional requirements from a (complex) event

processing system and the main functional requirements addressed in the scope of the mobile fraud use

case in the project.

In order to be flexible, event processing engines need to tackle the two following requirements in a

satisfactory way:

• The easy adaptability to non-functional requirements, specially, the way the tool copes with

scalability issues in a distributed environment.

• The ease definition and maintenance of the event-driven logic.

Regarding the first requirement, in FERARI, the proposed architecture is a scalable distributed

environment that combines event processing capabilities (PROTON) on top of a streaming platform

(Storm). Regarding the second requirement, we propose to develop TEM, which enables the definition

and maintenance of event-driven applications by non-technical people.

7 Summary and future steps Our goal in FERARI is to bring event processing much closer to the business world by extending simple

stream processing of numeric or textual data to the much more powerful realm of complex event

processing in a way that is both consumable to business users and as a seamless part of Big Data

applications.

CEP has already built up significant momentum manifested in a steady research community and variety

of commercial as well as open source products. Capitalizing on those works, our approach is to provide a

model to construct event processing applications by using a goal-driven declarative approach to define

the requirements for event processing applications and generate implementable complete designs out

of these requirements. The requirements will include both functional requirements such as: event

filtering, event aggregations, and event patterns, as well as non-functional requirements such as:

scalability and fault-tolerance.

By applying TEM, flexibility is achieved using an implementation independent meta-model based on

table representation that can be presented in a spreadsheet like fashion and a set of diagrams that are

both easily consumable by business users who are used to work with spreadsheets, and expressive

37


enough so it can directly generate code. Note that the control of both the functional and non-functional

specification over all the life cycle will stay at the hands of the business users – the generated code will

be untouchable.

During the second year of the project we plan to extend current TEM tables and diagrams to cope with

FERARI’s requirements in the mobile fraud detection use case along with the implementation of the

event processing network for the use cases presented in this report in PROTON on Storm.

38


8 References

[1]. Altman R., Schulte W. R., Natis Y. V., Pezzini M., Driver M., Blanton C. E., Wilson N., and Van

Huizen G. 2014. Agenda Overview for Application Architecture. Gartner report G00261571.

Published: 10 January 2014.

[2]. Linden A. 2104. Hype Cycle for Advanced Analytics and Data Science. Gartner report

G00262076. Published: 30 July 2014.

[3]. LeHong H., Fenn J., and Toit R. L-du. 2014. Hype Cycle for Emerging Technologies. Gartner

report G00264126. Published: 28 July 2014.

[4]. Steenstrup K. 2014. Hype Cycle for Operational Technology. Gartner report G00263170.

Published: 23 July 2014.

[5]. LeHong H. and Velosa A. 2014. Hype Cycle for the Internet of Things. Gartner report G00264127.

Published: 21 July 2014.

[6]. Cugola G. and Margara A. 2012. Processing Flows of information: From Data Stream to Complex

Event Processing. ACM Comput. Surv., 44(3), 2012.

[7]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company.

[8]. Esper reference document. [Online]. At: http://esper.codehaus.org/esper-

4.10.0/doc/reference/en-US/html/index.html.

[9]. Mendes M. R., Bizarro P., and Marques P. 2009. A performance study of event processing

systems. In Performance Evaluation and Benchmarking, 221-236. Springer.

[10]. Alevizos E. and Artikis A. 2014. Being Logical or Going with the Flow? A Comparison of Complex

Event Processing Systems. 8th Hellenic Conference on Artificial Intelligence.

[11]. Etzion O. 2010. Temporal Aspects of event processing. Handbook of distributed event based

system.

[12]. Adi A. and Etzion O. 2004. AMIT – The situation manager. VLDB J. 13 (2), 177-203.

[13]. Etzion O., Rabinovich E., and Skarbovsky I. 2011. Non-functional properties of event processing.

In Proceedings of the Fifth ACM International Conference on Distributed Event-Based Systems

(DEBS 2011), 365-366.

[14]. Rabinovich E., Etzion O., and Gal A. 2011. Pattern rewriting framework for event processing

optimization. In Proceedings of the Fifth ACM International Conference on Distributed Event-

Based Systems (DEBS 2011), 101-112.

[15]. Lakshmanan G., Rabinovich Y., and Etzion O. 2009. A stratified approach for supporting high

throughput event processing application. In Proceedings of the Fifth ACM International

Conference on Distributed Event-Based Systems (DEBS 2009).

[16]. von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press.

[17]. Luckham D. and Schulte R. 2011. EPTS Event Processing Glossary v2.0. Technical report.

[Online]. At: http://www.complexevents.com/2011/08/31/epts-event-processing-glossary-

updated-to-version-2-0/.

http://esper.codehaus.org/esper-4.10.0/doc/reference/en-US/html/index.html

http://esper.codehaus.org/esper-4.10.0/doc/reference/en-US/html/index.html

39


[18]. Fülöp L. J., Tóth G., Rácz R., Pánczél J., Gergely T., Beszédes A., and Farkas L. 2010. Survey on

complex event processing and predictive analytics. In Proceedings of the Fifth Balkan

Conference in Informatics, 26-31.

[19]. Gualtieri M. and Curran R. 2014. The Forrester Wave™: Big Data Streaming Analytics Platforms.

Q3 2014, July 17. [Online]. At: http://forms2.tibco.com/rs/tibcoinfra/images/

Forrester %20Wave %20Big%20Data% 20Streaming%20Analytics%207.17.14.pdf.

[20]. Shulte R. 2014. An Overview of Event Processing Software (August, 25, 2014), [Online]. At:

http://www.complexevents.com/

[21]. Vincent P., 2014. CEP tooling market survey. December 3, 2014, [Online]. At:

http://www.complexevents.com/2014/12/03/cep-tooling-market-survey-2014/

[22]. Babcock B., Babu S., Datar M., Motwani R., and Widom J. 2002. Models and issues in data

stream systems. In Proceedings of the 21st ACM SIGMOD/PODS Symposium on Principles of

Database Systems (PODS’02). ACM, New York, NY, 1–16.

[23]. Luckham D. The power of events: an introduction to complex event processing in distributed

enterprise systems. 2001. Addison-Wesley Longman Publishing Co., Inc.

[24]. Proton user guide and programmer guide [Online]. At:

https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_IBM_Proactive_Te

chnologyOnline_User_and_Programmer_Guide

[25]. Open specification (REST api) [Online]. At: http://forge.fi-

ware.org/plugins/mediawiki/wiki/fiware/index.php/Complex_Event_Processing_Open_RESTful

_API_Specification

[26]. Installation and administration guide. [Online]. At: https://forge.fi-

ware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_-

_IBM_Proactive_Technology_Online_Installation_and_Administration_Guide

[27]. Etzion O. and von Halle B.: 2013. The Event Model. [Online]. At:

http://www.slideshare.net/opher.etzion/er-2013-tutorial-modeling-the-event-driven-world.

[28]. Fournier F. and Limonad L., 2014. The BE2 model: When Business Events meet Business Entities.

DAB14 Workshop.

[29]. von Halle B. and Fournier F. 2014. Introducing the Next Horizon: The TEM Model (Part 1 - A

Paradigm for Processing Complex Events in Real Time)

http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3036/Introducing-the-Next-

Horizon-The-Event-Model-TEM.aspx

[30]. Fournier F. and von Halle B. 2014. Introducing the Next Horizon: The Event Model (Part 2 –

The Event Processing in Action)

http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3059/The-Event-Model-

TEM-in-Action.aspx

[31]. Etzion O. and Adkins J.M. 2013. 2013. Tutorial: Why is event-driven thinking different from

traditional thinking about computing? In Proceedings of the Seventh ACM International

Conference on Distributed Event-Based Systems (DEBS 2013), 269–270.

http://www.complexevents.com/

http://www.complexevents.com/2014/12/03/cep-tooling-market-survey-2014/

https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_-_IBM_Proactive_Technology_Online_User_and_Programmer_Guide

https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_-_IBM_Proactive_Technology_Online_User_and_Programmer_Guide

http://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/Complex_Event_Processing_Open_RESTful_API_Specification



https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_-_IBM_Proactive_Technology_Online_Installation_and_Administration_Guide



http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3036/Introducing-the-Next-Horizon-The-Event-Model-TEM.aspx

http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3036/Introducing-the-Next-Horizon-The-Event-Model-TEM.aspx

http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3059/The-Event-Model-TEM-in-Action.aspx

http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3059/The-Event-Model-TEM-in-Action.aspx

40


[32]. Schulte W.R. and Luckham D. 2013. Introduction to Real-Time Intelligence. [Online]. At:

http://www.complexevents.com/2013/09/17/understanding-real-time-intelligence/,

September 2013.

[33]. Shulte W.R. 2012. Does anyone care about event processing?. [Online]. At:

http://www.complexevents.com/2012/07/25/does-anyone-care-about-event-processing/, July

2012.

[34]. Bry F., Eckert M., Etzion O., Pashchke A., and Riecke J. 2009. Event processing Language Tutorial,

[Online]. At: http://www.slideshare.net/opher.etzion/debs2009-event-processing-languages-

tutorial

[35]. Forgy C. 1982. Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem.

Artificial Intelligence 19(1), 17-37.

[36]. Margara A., Cugola G., and Tamburrelli G. 2014. Learning from the past: automated rule

generation for complex event processing. In Proceedings of the 8th ACM International

Conference on Distributed Event-Based Systems (DEBS2014), 47-58.

[37]. Artikis A., Sergot M., and Paliouras G. 2014. An Event Calculus for Event Recognition. IEEE

Transactions on Knowledge and Data Engineering (TKDE).

[38]. Broda K. Clark, R. M. and Russo A. 2009. Sage: A logical agent-based environment monitoring

and control system. In AmI, 112-117.

[39]. Schultz-Moller N. P., Migliavacca M., and Pietzuch P. 2009. Distributed complex event

processing with query rewriting. Proceedings of the Third ACM International Conference on

Distributed Event-Based Systems (DEBS2009), 1-12.

[40]. Demers A. J., Gehrke J., Hong M., Riedewald M., and White W. M. 2006. Towards expressive

publish/subscribe systems. Intl Conference on Extending Database Technology (EDBT), 627-644.

[41]. Wang F. and Liu P. 2005. Temporal management of rfd data. In Proceedings of the 31st VLDB

Conference, 1128-1139.

[42]. Anicic D., Fodor P., Rudolph S., Stuhmer R., Stojanovic N., and Studer R. 2011. Etalis: Rule-based

reasoning in event processing. Reasoning in Event-Based Distributed Systems, 99-124.

[43]. Artikis A., Paliouras G., Portet F., and Skarlatidis A. 2010. Logic-based representation, reasoning

and machine learning for event recognition. Proceedings of the Forth ACM International

Conference on Distributed Event-Based Systems (DEBS2010), 282-293.

[44]. Cugola G. and Margara A. 2012. Complex event processing with t-rex. Journal of Systems and

Software, 85(8), 1709-1728.

[45]. Bhargavi R, Ravi Pathak, and Vaidehi V. 2013. Dynamic Complex Event Processing – Adaptive

Rule Engine. International Conference on Recent Trends in Information Technology (ICRTIT),

189-194.

[46]. Biscotti F., Schulte W.R., Iijima K., and Heudecker N.. 2014. Market Guide for Event Stream

Processing. Gartner report G00263080. Published: 14 August 2014.

[47]. Anicic D., Rudolph S., Fodor P., and Stojanovic N. 2012. Real-Time Complex Event Recognition

and Reasoning – A Logic Programming Approach. Applied Artificial Intelligence, Volume 26

Special Issue on Event Recognition (January 2012).

http://www.complexevents.com/2013/09/17/understanding-real-time-intelligence/

http://www.complexevents.com/2012/07/25/does-anyone-care-about-event-processing/

http://www.slideshare.net/opher.etzion/debs2009-event-processing-languages-tutorial

http://www.slideshare.net/opher.etzion/debs2009-event-processing-languages-tutorial

ict, strep ferari ict-fp7-619491 flexible event processing for …€¦ · flexible event...

Documents