ict, strep ferari ict-fp7-619491 flexible event processing for …€¦ · flexible event...
TRANSCRIPT
Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
ICT, STREP
FERARI ICT-FP7-619491
Flexible Event pRocessing for big dAta
aRchItectures
Collaborative Project
D4.1
Requirements and state of the art overview of flexible event processing
01.02.2013 – 31.01.2014(preparation period)
Contractual Date of Delivery: 31.01.2015
Actual Date of Delivery: 31.01.2015
Author(s): Fabiana Fournier and Inna Skarbovsky
Institution: IBM
Workpackage: Flexible Event Processing
Security: PU
Nature: R
Total number of pages: 48
D4.1 Requirements and state of the art overview on flexible event processing
Project coordinator name Michael Mock Revision: 1
Project coordinator organisation name Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
URL: http://www.iais.fraunhofer.de
Abstract The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way
for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling
business users to express complex analytics tasks through a high-level declarative language that
supports distributed complex event processing as an integral part of the system architecture. Work
package 4 “Flexible Event Processing” deals with all the developments around event processing
technologies in order to achieve this goal.
In order to be flexible, event processing engines need to tackle the two following requirements in a
satisfactory way:
• The easy adaptability to non-functional requirements, specially, the way the tool copes with
scalability issues in a distributed environment.
• The easy definition and maintenance of the event-driven logic.
The task of work package 4 is to provide a model and methodology to cope with these limitations. The
proposed approach addresses both the functional and non-functional properties of event processing
applications by supporting non-technical users with a declarative language expressed in tabular forms.
The outcome model can be then automatically translated into event driven definitions and eventually
into a running application in the proposed FERARI architecture.
D4.1 Requirements and state of the art overview on flexible event processing
Revision history Administration Status
Project acronym: FERARI ID: ICT-FP7-619491
Document identifier: D4.1 Requirements and state of the art overview of flexible event processing
(01.02.2013 – 31.01.2014)
Leading Partner: IBM
Report version: 1
Report preparation date: 31.01.2014
Classification: PU
Nature: REPORT
Author(s) and contributors: Fabiana Fournier and Inna Skarbovsky
Status: - Plan
- Draft
- Working
- Final
x Submitted
Copyright This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use within the
consortium and the European Commission.
www.ferari-project.eu
D4.1 Requirements and state of the art overview on flexible event processing
Document History Version Date Author Change Description 0.1 0.2
15/11/2014 1/12/2014
Fabiana Fournier (IBM) Fabiana Fournier (IBM)
First draft Second draft including sections 3 and 4
0.3 0.4 0.5
15/12/2014 15/12/2014 15/12/2014
Fabiana Fournier (IBM) Fabiana Fournier (IBM) Fabiana Fournier (IBM)
First complete version Inclusion of abstract Updates per internal review
1.0 30/12/2014 Fabiana Fournier (IBM) Final fixes and cleanup
D4.1 Requirements and state of the art overview on flexible event processing
Table of Contents 1 Introduction .......................................................................................................................................... 1
1.1 Purpose and scope of the document ............................................................................................ 1
1.2 Relationship with other documents ............................................................................................. 1
2 Complex event processing – The motivation ........................................................................................ 1
3 Complex event processing – The business case ................................................................................... 4
4 State of the art in complex event processing tools .............................................................................. 6
4.1 Commercial tools .......................................................................................................................... 8
4.1.1 InfoSphere Streams (IBM) [18][19] ....................................................................................... 9
4.1.2 Informatica Platform for streaming analytics (Informatica), ................................................ 9
4.1.3 Event Stream Processor (ESP) (SAP) [18][19] ........................................................................ 9
4.1.4 Apama (Software AG) [18] [19] ........................................................................................... 10
4.1.5 StreamBase (Tibco) [18] [19] .............................................................................................. 10
4.2 Open source engines ................................................................................................................... 10
4.2.1 Esper (EsperTech Inc) .......................................................................................................... 11
4.2.2 IBM Proactive Technology Online (PROTON)...................................................................... 11
4.2.3 Open source event processing running on distributed stream computing platforms ....... 12
4.3 Research tools ............................................................................................................................. 13
4.4 Limitations of contemporary event processing tools ................................................................. 14
5 Complex event processing background .............................................................................................. 14
5.1 Event types .................................................................................................................................. 15
5.2 Event attributes .......................................................................................................................... 16
5.3 Context ........................................................................................................................................ 16
5.4 Event Processing Network (EPN) ................................................................................................ 17
5.5 Event Processing Agent (EPA) ..................................................................................................... 17
5.6 Pattern policies ........................................................................................................................... 18
5.7 Context initiator policies ............................................................................................................. 19
5.8 PROTON definitions .................................................................................................................... 20
6 Requirements for flexible event processing ....................................................................................... 21
D4.1 Requirements and state of the art overview on flexible event processing
6.1 Non-functional requirements of event processing applications ................................................ 21
6.1.1 Scalability ............................................................................................................................ 22
6.1.2 Availability ........................................................................................................................... 22
6.1.3 Security ............................................................................................................................... 23
6.1.4 Performance objectives ...................................................................................................... 23
6.1.5 Usability............................................................................................................................... 24
6.2 Requirements for the mobile fraud use case ............................................................................. 26
6.2.1 Description of the mobile fraud use case ........................................................................... 27
6.2.1 Event types .......................................................................................................................... 28
6.2.2 Event processing agents ...................................................................................................... 29
6.2.3 Mobile phone fraud use case functional requirements summary ..................................... 35
6.3 Introduction to the event model ................................................................................................ 35
6.4 Summary of the requirements for flexible event processing in FERARI ..................................... 36
7 Summary and future steps .................................................................................................................. 36
8 References .......................................................................................................................................... 38
List of Tables Table 1: Initial EPN for the mobile phone fraud use case ........................................................................... 28
D4.1 Requirements and state of the art overview on flexible event processing
List of Figures Figure 1: Illustration of an event processing network ................................................................................ 17
Figure 2: Event recognition process in an EPA ............................................................................................ 18
Figure 3: Mobile fraud use case initial EPN ................................................................................................ 27
Figure 4: Event recognition process for Filtering EPA ................................................................................. 29
Figure 5: Context for Filter EPA ................................................................................................................... 30
Figure 6: Event recognition process for FrequentLongCallsAtNight EPA ................................................... 30
Figure 7: Context for FrequentLongCallsAtNight EPA ................................................................................. 31
Figure 8: Event recognition process for FrequentLongCalls EPA ................................................................ 32
Figure 9: Context for FrequentLongCalls EPA ............................................................................................. 33
Figure 10: Event recognition process for FrequentEachLongCall EPA ........................................................ 33
Figure 11: Context for FrequentEachLongCall EPA ..................................................................................... 34
Figure 12: Event recognition process for ExpensiveCalls EPA .................................................................... 34
Figure 13: Context for ExpensiveCall EPA ................................................................................................... 35
D4.1 Requirements and state of the art overview on flexible event processing
Acronyms ASF Apache Software Foundation
BAM Business Activity Monitoring
CEP Complex Event Processing
DBMS Data base Management System
DEBS Distributed Event Based
DSCP Distributed Stream Computing Platforms
DSMS Data Stream Processing System
EAI Enterprise Application Integration
ECA Event-Condition-Action
EPA Event Processing Agent
EPL Event Processing Language
EPN Event Processing Network
EPTS Event Processing Technical Society
ESP Event Stream Processing
FERARI Flexible Event pRocessing for big dAta aRchItectures
JSON JavaScript Object Notation
IP Intellectual Property
SaaS Software as a Service
SCADA Supervisory Control And data Acquisition
SIEM Security Information and Event Management
TDM The Decision Model
TEM The Event Model
WP Work Package
1
D4.1 Requirements and state of the art overview on flexible event processing
1 Introduction 1.1 Purpose and scope of the document The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way
for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling
business users to express complex analytics tasks through a high-level declarative language that
supports distributed complex event processing as an integral part of the system architecture. Work
package 4 (WP4) “Flexible Event Processing” deals with all the developments around event processing
technologies in order to achieve this goal.
This report surveys the state of the art in event processing systems including products and research
assets, trends, and limitations of current offerings; and hints towards the way to cope with current
limitations in the scope of the project. The report also describes non-functional and functional
requirements of event processing engines in relation to the mobile fraud use case of the project.
Note that we use complex event processing and event processing, as well as tool, engine and system,
interchangeable throughout this report.
This report is structured as follows: Section 2 gives the background for the appearance of complex event
driven systems from the technical point of view, whilst Section 3 adds the business incentive. Section 4
surveys main commercial, open source, and research event processing tools. Section 5 provides some
necessary background on the semantics used in the FERARI project. Section 6 describes the
requirements for flexible event processing including details on the mobile fraud use case. We conclude
the report with summary and future steps in Section 7.
1.2 Relationship with other documents FERARI stands for Flexible Event pRocessing for big dAta aRchItectures, therefore there is a tight
connection between event processing components and the rest of the components that form the
FERARI architecture, specifically, this deliverable is strongly related to D2.1 - Architecture definition in
WP2. The requirements for the event processing engine are dictated from the use cases in the project,
thus, this report is also strongly related to D1.1 - Application Scenario Description and Requirement
Analysis in WP1.
2 Complex event processing – The motivation
In the past decade, there has been an increase in demand to process continuously flowing data from
external sources at unpredictable rate to obtain timely responses to complex queries. Traditional Data
2
D4.1 Requirements and state of the art overview on flexible event processing
Base Management Systems (DBMSs) require data to be (persistently) stored and indexed before it could
be processed, and process data only when explicitly asked by the users, that is, asynchronously with
respect to its arrival. These requirements led to the development of a number of systems specifically
designed to process information as a flow according to a set of pre-deployed processing rules. Two
models have emerged [6]: the data stream processing model [22] and the complex event processing
model [23].
Data Stream Management Systems (DSMSs) differ from conventional Data Base Management Systems
(DBMSs) in several ways: (a) as opposed to tables, streams are usually unbounded; (b) no assumption
can be made on data arrival order; and (c) size and time constraints make it difficult to store and process
data stream elements after their arrival, and therefore, one time processing is the typical mechanism
used to deal with streams. Users of DSMS install standing (or continuous) queries, i.e., queries that are
deployed once and continue to produce results until removed. Standing queries can be executed
periodically or continuously, as new streams items arrive. As opposed to DBMSs, users in DSMSs do not
have to explicitly ask for updated information; rather the system actively notifies it according to installed
queries. DSMSs focus on producing queries results, which are continuously updated to in accordance to
the constantly changing contents of their input data. Detection and notification of complex patterns of
elements involving sequences and ordering relations are usually out of the scope of DSMSs. DSMSs
mainly focus on flowing data and data transformation, but only a few allow the easy capture of
sequences of data involving complex ordering relationships, not to mention taking into account the
possibility to perform filtering, correlation, and aggregation of data directly in-network, as streams flow
from sources to sinks.
Complex Event Processing (CEP) systems adopt an opposite approach. They associate a precise
semantics to the information items being processed: they are notifications of events which happened in
the external world and were observed by sources, also called event producers. The CEP engine is
responsible for filtering and combining such notifications to understand what is happening in terms of
higher-level events (a.k.a complex events, composite events, or situations) to be notified to sinks, called
event consumers. CEP systems put emphasis on the issue that represents the main limitation of DSMSs,
that is, the ability to detect complex patterns of incoming items, involving sequencing and ordering
relationships. An example for a situation is a Suspicious account which is detected whenever there are
at least three large cash deposits within 10 days for the same account. Event processing is in essence a
paradigm of reactive computing: a system observes the world and reacts to events as they occur. It is an
evolutionary step from the paradigm of responsive computing, in which a system responds only to
explicit service requests. Event processing has evolved in the past years departing from traditional
computing architectures which employ synchronous, request-response interactions between client and
servers to reactive application, in which decisions are driven by events.
3
D4.1 Requirements and state of the art overview on flexible event processing
CEP [20] is a technique in which incoming data about what is happening (event data) is processed more
or less as it arrives to generate higher-level, more-useful, summary information (complex events). Event
processing platforms have built-in capabilities for filtering incoming data, storing windows of event data,
computing aggregates and detecting patterns. In a more formal terminology, CEP software is any
computer program that can generate, read, discard and perform calculations on events. A complex
event is an abstraction of one or more raw or input events. Complex events may signify threats or
opportunities that require a response from the business. One complex event may be the result of
calculations performed on a few or on millions of events from one or more event sources. A situation
may be triggered by the observation of a single raw event, but is more typically obtained by detecting a
pattern over the flow of events. Many of these patterns are temporal in nature [11], but they can also
be spatial, spatio-temporal, or modal [7]. Event processing deals with these functions: get events from
sources (event producers), route these events, filter them, normalize or otherwise transform them,
aggregate them, detect patterns over multiple events, and transfer them as alerts to a human or as a
trigger to an autonomous adaptation system (event consumers). An application or a complete definition
set made up of these functions is also known as an Event Processing Network (EPN).
As aforementioned, the goal of a CEP engine is to notify its users immediately upon the detection of a
pattern of interest. Data flows are seen as streams of events, some of which may be irrelevant for the
user's purposes. Therefore, the main focus is on the efficient filtering out of irrelevant data and
processing of the relevant. Obviously, for such systems to be acceptable, they have to satisfy certain
efficiency, fault tolerance, and accuracy constraints, such as low latency and robustness.
CEP platforms required a new type of architecture. Conventional architectures are not fast or efficient
enough for some applications because they use a "save-and-process" paradigm in which incoming data
is stored in databases in memory or on disk, and then queries are applied. When fast responses are
critical, or the volume of incoming information is very high, application architects instead use a
"process-first" CEP paradigm, in which logic is applied continuously and immediately to the "data in
motion" as it arrives. CEP is more efficient because it computes incrementally, in contrast to
conventional architectures that reprocess large datasets, often repeating the same retrievals and
calculations as each new query is submitted.
CEP has already successfully been applied to several domains: sensor networks for environmental
monitoring [13]; payment analysis for fraud detection [39]; financial applications for trend discovery [40];
RFID-based inventory management for anomaly detection [41]. According to Gartner [46], over a third
of spending on event processing technologies comes from the financial services institution vertical.
More in general, as observed in [23], the information system of every company could and should be
organized around an event-based core that acts as a nervous system to guide and control the other sub-
systems.
CEP has already built up significant momentum manifested in a steady research community and a
variety of commercial as well as open source products [6]. Today, a large variety of commercial and
4
D4.1 Requirements and state of the art overview on flexible event processing
open source event processing tools is available to architects and developers who are building event
processing applications (see Section 0). These are sometimes called event processing platforms,
streaming analytics platform, complex-event processing systems, event stream processing (ESP) systems,
or distributed stream computing platforms (DSCPs). For example, Forrester[19] defines streaming
analytics platform as: “Software that can filter, aggregate, enrich, and analyze a high throughput of data
from multiple disparate live data sources and in any data format to identify simple and complex patterns
to visualize business in real-time, detect urgent situations, and automate immediate actions”. In their
definition, streaming analytics platforms include both development tools to create streaming
applications and a run-time platform.
However, we distinguish between platforms that can do complex patterns over events and platforms
that only can perform filtering on events and offer the possibility to add the pattern logic. (Complex)
event processing systems are general purpose development and runtime tools that are used by
developers to build custom, event-processing applications without having to re-implement the core
algorithms for handling event streams; as they provide the necessary building blocks to build the event
driven applications. DSCPs, on the other hand, are general-purpose platforms without full native CEP
analytic functions and associated accessories, but they are highly scalable and extensible and usually
offer an open programming model, so developers can add the logic to address many kinds of stream
processing applications, including some CEP solutions. Therefore, they are not considered “real”
complex event processing platforms. As we will see in Section 4.2.3 , today there are already some
implementations that take advantage of the pattern recognition capability of CEP systems along with
the scalability capabilities that offer DSCPs, and offer a holistic architecture. FERARI architecture is one
example of this new approach.
3 Complex event processing – The business case
CEP usage is growing rapidly because CEP, in a technical sense, is the only way to get information from
event streams in real-time or near-real time [32]. The system has to process the event data more or less
as it arrives so that the appropriate action can be taken quickly. Note, that we use the term “real time”
loosely, to include “near-real-time” or “business real time.”
Event processing has a marked impact on technical and business aspects of an enterprise [1]. From
technical perspective, CEP enables loose coupling among the components of an enterprise system or
end-to-end process, which makes this system or process highly adaptable while enabling service reuse.
For executives, CEP enables a performance-driven enterprise, allowing immediate, tactical, proactive
decision-making to be driven by a deep knowledge of the context of that decision, while comprehensive,
near-term operational data can inform tactical and strategic decisions.
More specifically, Gartner ([2],[3],[4], and [5]) denotes three business impacts of CEP:
5
D4.1 Requirements and state of the art overview on flexible event processing
Improves the quality of decision making by presenting information that would otherwise be
overlooked.
Enables faster response to threats and opportunities.
Helps shield business people from data overload by eliminating irrelevant information and
presenting only alerts and distilled versions of the most important information. CEP also adds
real-time intelligence to operational technology and business IT applications.
Moreover, the same Gartner reports [2],[3],[4], and [5]) state that companies should use CEP to enhance
their situation awareness and to build "sense-and respond" behavior into their systems. Situation
awareness means “understanding what is going on, so that you can decide what to do”. According to
these reports, CEP should be used in operational activities that run continuously and need ongoing
monitoring. This can apply to fraud detection, real-time precision marketing (cross-sell and upsell),
factory floor systems, website monitoring, customer contact center management, trading systems for
capital markets, transportation operation management (for airlines, trains, shipping and trucking) and
other applications. In a utility context, CEP can be used to process a combination of supervisory control
and data acquisition (SCADA) events and "last gasp" notifications from smart meters to determine the
location and severity of a network fault, and then to trigger appropriate remedial actions.
In Gartner’s Hype Cycle reports from 2014 ([2],[3],[4], and [5]) CEP remains positioned as
transformational, that means that it enables new ways of doing business across industries that will
result in major shifts in industry dynamics, because “it is the only way to get information from event
streams in real time”. According to these Gartner reports, “CEP will inevitably be adopted in multiple
places within virtually every company. However, companies were initially slow to adopt CEP because it is
so different from conventional architecture, and many developers are still unfamiliar with it. CEP has
moved slightly further past the Peak of Inflated Expectations, but it may take up to 10 more years for it
to reach its potential on the Plateau of Productivity”. According to a recent market guide for event
stream processing [46], “it may take up to 10 years for CEP to reach widespread usage in mainstream
companies. However, it is taken for granted in financial services today. We estimate that it is also in use
in more than 100 smart grid projects and a total of several thousand production deployments worldwide
in a range of industries”. In fact, Forrester [19] estimates a 66% increase in firms’ use of streaming
analytics in the past two years.
As these reports also note, CEP has the potential to influence all sectors: “CEP has already transformed
financial markets… and it is also essential to earthquake detection, radiation hazard screening, smart
electrical grids and real-time location-based marketing”. Furthermore, “CEP is also essential to future
Internet of Things applications where streams of sensor data must be processed in real time”. According
to Gartner, CEP should be used in operational activities that run continuously and need ongoing
monitoring. This can apply to fraud detection, real-time precision marketing (cross-sell and upsell),
factory floor systems, website monitoring, customer contact centre management, trading systems for
6
D4.1 Requirements and state of the art overview on flexible event processing
capital markets, transportation operation management (for airlines, trains, shipping and trucking) and
other applications. Note that our two business cases nicely fall in the list Gartner mentions as natural
and essential candidates for CEP.
Forrester [19] states that “Streaming analytics platforms can help firms detect insights in high velocity
streams of data and act on them on real time”. Moreover, “Business won’t wait. That is truer today than
ever before because of the white-water flow of data from innumerable real-time data sources. Market
data, clickstream, mobile devices, sensors, and even good old fashioned transactions may contain
valuable, but perishable insights. Perishable because the insights are only valuable if you can detect and
act on them right now. That’s where streaming analytics platforms can help”.
In summary, from the business point of view, we can conclude as stated in [32] that “Companies that
understand CEP have more and better real-time intelligence than those that don’t understand it. The
use of CEP will expand further as the pace of business accelerates, more data becomes available in real-
time, and business people demand better situation awareness”. The analysts’ reports presented here
only emphasis this statement. Accordingly the CEP market is forecast to grow rapidly. In fact, it is
forecast to grow to US$4.7bn in 20191.
4 State of the art in complex event processing tools
Today there exist a wide variety of commercial, open source, and research event processing tools.
According to Gartner [2],[3],[4], and [5]), companies should acquire CEP functionality by using an off-
the-shelf application or SaaS (Software as a Service) offering that has embedded CEP under the covers, if
a product that addresses their particular business requirements is available. Companies should consider
building their own event driven applications in one of the following three cases:
• When an appropriate off-the-shelf application or SaaS offering is not available, companies
should consider building their own CEP-enabled application on an operational intelligence
platform that has embedded CEP capabilities.
• For demanding, high-throughput, low-latency applications — or where the event processing
logic is primary to the business problem — companies should build their own CEP-enabled
applications on commercial or open source CEP platforms (see examples of vendors below) or
DSCPs.
• In rare cases, when none of the other tactics are practical, developers should write custom CEP
logic into their applications using a standard programming language without the use of a
commercial or open source CEP or DSCP product.
1 http://www.companiesandmarkets.com/Market/Information-Technology/Market-Research/Complex-Event-
Processing-CEP-Market-by-Algorithmic-Trading-Global-forecast-to-2019/RPT127618
7
D4.1 Requirements and state of the art overview on flexible event processing
Two forms of stream processing software have emerged in the past 15 years [2],[3],[4], [5] and [46]) .
The first were CEP platforms that have built-in analytic functions such as filtering, storing windows of
event data, computing aggregates, and detecting patterns. Modern commercial CEP platform products
include adapters to integrate with event sources, development and testing tools, dashboard and alerting
tools, and administration tools. More recently the second form — distributed stream computing
platforms (DSCPs) such as Amazon Web Services Kinesis2 and open source offerings including Apache
Samza3, Spark4 and Storm5 — was developed. As previously mentioned, DSCPs are general-purpose
platforms without full native CEP analytic functions and associated accessories, but they are highly
scalable and extensible so developers can add the logic to address many kinds of stream processing
applications, including some CEP solutions. Specially, Apache open source projects (Storm, Spark, and
recently Samza) have gained a fair amount of attention and interest ([46], [21]), and these may well
mature into commercial offerings in future and/or get embedded in existing commercial product sets.
Gartner is now tracking 20 vendors that offer pure-play CEP platforms and six that offer DSCPs [46]
(Note that this is not an exhaustive list):
CEP platforms or tools:
• Codehaus/EsperTech's Esper, NEsper
• Feedzai Pulse
• IBM InfoSphere Streams
• IBM Operational Decision Manager
• Informatica RulePoint
• Fujitsu Interstage Big Data Complex Event Processing
• Hitachi uCosminexus Stream Data Platform
• LG CNS' EventPro
• Microsoft StreamInsight
• OneMarketData OneTick CEP
• Oracle Event Processing
• Red Hat Drools Fusion/JBoss Enterprise BRMS
• SAP Event Stream Processor
• SAS DataFlux
• SQLstream s-Server
• Software AG Apama Event Processing Platform
• Tibco BusinessEvents
• Tibco StreamBase
• Vitria Operational Intelligence Analytic Server
2 http://aws.amazon.com/kinesis/
3 http://samza.incubator.apache.org/
4 http://spark.apache.org/streaming/
5 https://storm.apache.org/
8
D4.1 Requirements and state of the art overview on flexible event processing
• WS02 CEP Server
DSCPs:
• Google Cloud Dataflow
• Apache S4 (open-source software, originated in Yahoo
• Apache Samza (open source software, maybe from Linked-In)
• Apache Storm (originated in Twitter)
• Server DataTorrent RTS
This document focuses on event processing platforms and we will address DSCPs only in the context of
open source tooling offerings that already comprise the DSCP along with event processing capabilities,
as these are relevant to FERARI. DSCPs are discussed extensively in D2.1 and out of the scope of this
document.
In the following sections we will address the most popular tools in three categories: commercial, open
source, and research.
4.1 Commercial tools Most CEP tools are obtained as part of a larger product. Companies acquire a packaged application or
subscribe to a SaaS service that has embedded CEP under the covers. The company is buying a solution
that happens to require event processing, and it may not realize that CEP is being used. For example,
supply chain visibility products; security information and event management (SIEM) products; some
kinds of fraud detection and governance systems, risk and compliance products; system and network
monitoring systems; business activity monitoring (BAM) tools; and many other categories of software
implement some greater or lesser amount of CEP logic. In a few cases, the developers of these products
or SaaS offerings have leveraged the general purpose event processing platforms listed above to reduce
the amount of code they have to write. But in most cases, the developers implement a specialized
subset of event processing algorithms in new code to suit their application purposes.
Forrester’s evaluation of general-purpose big data streaming analytics platforms from Q3 2014 reveals
five leader vendors in the event processing niche [19]: IBM, Informatica, SAP, Software AG, and Tibco
Software. To assess the state of the big data streaming analytics market and see how the vendors and
their platforms stack up against each other, Forrester evaluated the strengths and weaknesses of the
top commercial big data streaming analytics platform vendors against 50 criteria, grouped into three
high-level buckets: current offering, strategy, and market presence. The leaders have high scores in all
the key evaluation areas: architecture, development tools, and stream processing. Next we will briefly
describe the five leading products.
9
D4.1 Requirements and state of the art overview on flexible event processing
4.1.1 InfoSphere Streams (IBM)6 [18][19]
InfoSphere Streams is a dedicated stream processing system, where the processing of events is
distributed among a dedicated cluster of machines. Depending on the hardware infrastructure and use
case, millions of events can be processed per second. IBM’s InfoSphere Streams supports high volume,
structured and unstructured streaming data sources such as images, audio, voice, VoIP, video, TV,
financial news, radio, police scanners, web traffic, email, chat, GPS data, financial transaction data,
satellite data, sensors, badge swipes, etc.. InfoSphere Streams emerged from IBM Research in 2009 and
continues to benefit from IBM’s significant investments in research. InfoSphere Streams include
customers in healthcare, financial services, telecommunications, government, energy and utilities,
financial services, manufacturing, and transportation. Note that IBM’s InfoSphere Streams can be
classified as either DSCP or CEP, depending on the context and the author’s point of view [20] .
4.1.2 Informatica Platform for streaming analytics (Informatica)7,8
At the core of Informatica’s event detection and response products is RulePoint. RulePoint is a Java-
based software product that acts as an enterprise event service, detecting complex business events as
they occur, and automatically initiating responses as required. RulePoint detects complex events across
disparate information sources including sensors, enterprise application integration (EAI), enterprise
applications, databases, text documents, and more. In 2011, RulePoint was refactored to include
streaming capabilities. They enable developers to author streaming applications using both business
rules and streaming operator constructs built into the platform. Examples of applications include a
geospatial tracking solution to monitor high-risk vessels before they enter ports or as they pass through
shipping areas that are predetermined to be high-risk locations; and a phishing attack management
solution support for banks, credit unions, online brokerages, and e-commerce companies.
4.1.3 Event Stream Processor (ESP) (SAP)9 [18][19]
SAP’s ESP (formerly Sybase Aleri) [18] is a complex event processing system designed for analyzing large
amount of various data in real time. It offers ability to filter, combine and normalize incoming data and
can be used to detect important patterns, changed conditions, security problems and much more. It can
be used to alert when events occurs or to react to events. The tool provides wide range of integrated
tools to improve productivity. With Studio 3, developers can create and manage their applications and
the event processing flow. Wide range of built-in adapters provide interfaces for JDBC, ODBC, JMS, etc.
With the XML based AleriML language data models can be defined, while its SPLASH developer script
language helps to develop much more complex applications what is unable to do with standard
relational programming languages. SAP’s ESP has a broad base of customers in financial services,
telecommunications, manufacturing, energy, retail, transportation and logistics, and public sector.
6 http://www-03.ibm.com/software/products/en/infosphere-streams
7 http://www.informatica.com/us/products/complex-event-processing/#fbid=ghA_Zem5ovE
8 http://www.complexevents.com/wp-content/uploads/2010/10/7107_EventDetectionAndResponse_web.pdf
9 http://www.sybase.com/products/financialservicessolutions/complex-event-processing
10
D4.1 Requirements and state of the art overview on flexible event processing
4.1.4 Apama10 (Software AG) [18] [19]
Apama Event Processing Platform is a complete CEP based tool acquired from Progress Software in 2013.
The CEP engine can handle inbound events within sub-seconds, find defined patterns, alert or respond
to actions. With Apama Event Modeler, developers can create applications via graphical user interface,
which are presentable with Apama Research Studio. Apama Dashboard Studio provides a set of tools to
develop visually rich user interfaces. Via Apama dashboards, users can start/stop, parameterize and
monitor event operations from both client and browser desktops. The Apama package includes many
major adapters to handle communication with other components and applications. Apama has a long
and strong history as a complex event processing platform used for algorithmic trading applications and
market monitoring dating back to its origins in 2001. But, it is also used by telecommunication firms and
credit card companies to provide real-time, location-based, and customer-preference based offers to
consumers. Other industries include retail banking, telecommunications, retail, gaming, logistics and
supply chain, government, energy and utilities, manufacturing.
4.1.5 StreamBase (Tibco)11 [18] [19]
Tibco Software has been a force in the high-frequency trading market for more than fifteen years, and
its acquisition of StreamBase in 2013 has given them the tools they need to meet the needs of the wider
streaming analytics market. StreamBase is a high performance event stream processing platform, which
provides efficient solution to build powerful applications for almost any usage area. It supports fast
development via graphical event-flow language and supports StreamSQL for providing ease of use,
flexibility, and extensibility capabilities for developers. This widely used software gives solutions for
telecommunication, capital markets, intelligence and military, e-commercial, and multiplayer online
gaming areas. In telecommunication, it provides services like network monitoring and protection,
bandwidth and quality-of-service monitoring, fraud detection and location based services and even
more.
4.2 Open source engines Open source is also an option when selecting a CEP engine with developers acquiring a basic open-
source event stream processing engine and then using common, general-purpose programming tools to
build the rest of the application.
In this section we briefly present two open source engines: Esper, today’s the most popular open source
engine, as stated by Gartner “open source CEP products, particularly Esper, have been embedded in
several thousand applications and commercial software products” [46], and PROTON from partner IBM
which is the complex event processing engine in the FERARI project. Other common open source
10
http://www.softwareag.com/corporate/products/apama_webmethods/analytics/products/default.asp 11
http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event-processing
11
D4.1 Requirements and state of the art overview on flexible event processing
engines include: Triceps12 and WSO2 Complex Event Processing Server13 (which uses the Siddhi14 engine
that started as a research project initiated at University of Moratuwa, Sri Lanka)
4.2.1 Esper (EsperTech Inc)15
Esper system [8], [18], which relies on a SQL-based language and Java has already been the target of
previous benchmark studies [9]. Esper is integrated into the Java and .NET languages (NEsper) and can
be used in CEP applications as a library. For ease of understanding, one could conceptualize the Esper
engine as a database turned upside-down. Traditional database systems work by storing incoming data
in disks, according to a predefined relational schema. They can hold an exact history of previous
insertions and updates are usually rare events. User queries are not known beforehand and there are no
strict constraints as far as their latency is concerned. The Esper engine, on the other hand, lets users
define from the very start the queries they are interested in, which act as filters for the streams of
incoming data. Events satisfying the filtering criteria are detected in “real-time" and may be pushed
further down the chain of filters for additional processing or published to their respective
listeners/subscribers [10].
Esper provides a rich set of constructs by which events and event patterns can be expressed. One way to
achieve event representation and handling is through the use of expression-based pattern matching.
Patterns incorporate several operators, some of which may be time-based, and are applied to sequences
of events. A new event matches the pattern expression whenever is satisfies its filtering criteria.
Another method to process events is through the event processing language (EPL) queries which
resemble in their syntax that of the well-known SQL. The most common SQL constructs may also be
used in EPL statements. However, the defined queries are not applied to tables but to views, which can
be understood as basic structures for holding events, according to certain user demands, e.g. the need
for grouping based on certain keys or for applying queries to events up to certain time point in the
past [10].
4.2.2 IBM Proactive Technology Online (PROTON)
In the FERARI project the complex event processing component is built on and extends the IBM
Proactive Technology Online (PROTON) research asset. This asset has become open source16 in the
scope of the FI-WARE FI-PPP project17 (PROTON being the CEP Generic Enabler in the FI-WARE
platform18). Technical documentation regarding PROTON can be found in [24][25], and [26].
PROTON comprises an authoring tool, a run-time engine, producers, and consumers adapters.
Specifically, it includes an integrated run-time platform to develop, deploy, and maintain event-driven
12
http://triceps.sourceforge.net/ 13
http://wso2.com/products/complex-event-processor 14
http://siddhi-cep.blogspot.co.il/ 15
http://www.espertech.com/ 16
Link to the open source: https://github.com/ishkin/Proton 17
http://www.fi-ware.org/ 18
https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/FI-WARE_Architecture
12
D4.1 Requirements and state of the art overview on flexible event processing
applications using a single programming model. The specific architecture of PROTON and its
implementation in the scope of the FERARI project are described in D2.1 – Architecture definition.
4.2.3 Open source event processing running on distributed stream computing platforms
As previously mentioned, many vendor products that claim streaming analytics functionality are actually
frameworks to ingest and route data, but they lack streaming operators and developers must code them
themselves. Henceforth, we survey three recent attempts of integrating event processing open source
tools with DSCP open source platforms.
4.2.3.1 Streaming-cep-engine
Streaming-cep-engine19 is a Complex Event Processing platform built on Spark Streaming. It is the result
of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP
engine as complex event processing engine (Siddhi is the core engine of WSO2 open source tool). It was
first introduced in Spark Summit 201420.
4.2.3.2 Esper on top of Storm
The storm-esper21 library provides a bolt that allows using Esper queries on Storm data streams (for
Storm building blocks refer to D2.1). Storm’s tuples are quite similar to Esper’s map event types. The
tuple field names map naturally to map keys and the field values to values for these keys. The tuple
fields are not typed when they are defined, and considered by Esper as of type Object. In addition, the
fact that tuples have to be defined before a topology is running makes it relatively easy to define the
map event type in the setup phase.
The Esper bolt itself is generic. It receives esper statements and the names of the output fields which
will be generated by these esper statements.
The bolt code itself consists of three pieces. The setup part constructs map event types for each input
stream and registers them with Esper. The second part is the transfer of data from Storm to Esper. The
execute(Tuple tuple) method is called by Storm whenever a tuple from any of the connected streams is
sent to the bolt. The Esper bolt code first has to find the event type name corresponding to the tuple.
Then it iterates over the fields in the tuple and puts the values into a map using the field names as the
keys. Finally, it passes that map to Esper.At this moment, Esper routes this map (the event) through the
statements which in turn might produce new data that it needs to hand back to Storm. For this purpose,
the bolt registered itself as a listener for data emitted from any of the statements that were configured
in Storm during the setup. Esper then calls back the update method on the bolt if one of the statements
generated data. The update method will then basically perform the reverse operation of
the execute method and convert the event data to a tuple.
19
http://stratio.github.io/streaming-cep-engine/ 20
http://spark-summit.org/2014/talk/stratio-streaming-a-new-approach-to-spark-streaming 21
https://github.com/tomdz/storm-esper
13
D4.1 Requirements and state of the art overview on flexible event processing
4.2.3.3 PROTON on top of Storm
Storm is an Apache incubation project at The Apache Software Foundation (ASF), sponsored by the
Apache Incubator. It is utilized by well-known companies with significant volumes of streaming data,
such as The Weather Channel, Spotify, Twitter, and Rocket Fuel (refer to D2.1).
In the scope of the FERARI project, PROTON has been implemented on top of Storm, thus making it a
distributed and scalable CEP engine. For details refer to D2.1 – Architecture definition.
4.3 Research tools There are also many research tools developed in the last decade. They include:
• Amit (IBM Haifa Research Lab) [12]
• Aurora (Brandeis University, Brown University and MIT)22
• Borealis (Brandeis University, Brown University and MIT)23
• Cayuga (Cornell University)24
• ETALIS (Forschungszentrum Informatik Karlsruhe and Stony Brook University) [47]
• NiagaraST (Portland State University)25
• STREAM (Stanford University)26
• Telegraph (UC Berkeley)27
• epZilla (University of Moratuwa)28
We will briefly describe Amit and Etalis as these are the only two ones appearing in the last CEP Tooling
Market Survey 2014 [21].
4.3.1.1 Amit [12]
IBM Research in Haifa has developed a fully functional event processing research asset [12], which is
capable of processing raw event streams from different sources, indentifying specific patterns that are
of interest, and forwarding derived events to subscribers. Amit is no longer supported and has been
replaced by open source PROTON (see 4.2.2).
4.3.1.2 Etalis [47]
The ETALIS system provides an expressive logic-based language for specifying and combining complex
events. For this language both a syntax as well as a formal declarative semantics are provided. The
language enables efficient run time event recognition and supports deductive reasoning. Execution
model of the language is based on a compilation strategy into Prolog.
22
http://cs.brown.edu/research/aurora/ 23
http://cs.brown.edu/research/borealis/public/ 24
http://www.cs.cornell.edu/bigreddata/cayuga/ 25
http://datalab.cs.pdx.edu/niagaraST/ 26
http://infolab.stanford.edu/stream/ 27
http://telegraph.cs.berkeley.edu/ 28
http://www.epzilla.org/
14
D4.1 Requirements and state of the art overview on flexible event processing
4.4 Limitations of contemporary event processing tools As has been presented above, there is a large variety of research prototypes as well as commercial
products and platforms. Still, despite the outlooks and the maturity of the tools, CEP tools are not widely
used. In fact, most applications that implement CEP logic don’t use dedicated event processing tools [1].
Some user companies have written custom applications with CEP logic rather than leveraging an off-the-
shelf event processing platform. This was especially common in the 1990s and early 2000s before the
products were widely available, and some developers still choose to write their own CEP logic for
performance or cost reasons. For example, large banks and related financial services companies have
built front-office systems for capital markets trading with their own embedded CEP logic [32]. Gartner
analyst Roy Schulte estimated in July 2012 that around 95% of the event processing applications are
built using ad-hoc programming and do not use existing frameworks [33]. Two of the main
reasons [2] [31] are the difficulty to think in terms of event driven architectures which are asynchronous
in nature, and the relative complexity of existing tools, making them impractical and inaccessible for
business users. In practice, the design of event-driven applications is either done using current
dedicated event processing tools by skilled IT developers that have good familiarity with the event
processing engine and the particular way to bypass the engine’s limitations, or in hand-coded fashion.
As pointed out by Forrester [19] “The streaming application programming model is unfamiliar to most
application developers. It’s a different paradigm from normal programming where code execution
controls data. In streaming applications, the incoming data controls the code”.
In addition, current tools also often lack the ability to process large volumes of distributed (complex)
events which become increasingly important in modern automated business decision processes.
In other words, current event processing tools are not flexible enough, as they require IT expert skills, do
not easily scale, and cannot always run in distributed environments, limiting their usability and widely
spread in the Big Data era.
Before discussing the requirements for flexible event processing systems in details, we next we describe
some basic terms necessary for gaining a common understanding.
5 Complex event processing background Since no widely accepted standard exists for the concepts of event processing, several synonyms appear
in the literature and several attempts have been made in the last years towards homogeneity.
The Event Processing Technical Society (EPTS) is an inclusive group of organizations and individuals
aiming to increase awareness of event processing, foster topics for future standardization, and establish
event processing as a separate academic discipline. The goal of the EPTS is development of shared
understanding of event processing terminology. The society believes that through communicating the
shared understanding developed within the group it would become a catalyst for emergence of effective
interoperation standards, would foster academic research, and creation of training curriculum. In turn, it
15
D4.1 Requirements and state of the art overview on flexible event processing
would lead to establishment of event processing as a discipline in its own right. The EPTS members hope
that through combination of academic research, vendor experience and customer data they will be able
develop a unified glossary, language, and architecture that would homogenize event processing in a
similar way. The society started as an informal group in 2005/2006. It was formally launched as a
consortium in June 2008. Membership of the consortium is based on a formal agreement defining
intellectual property (IP) ownership terms and rules of engagement. The society is governed by a
Steering Committee consisting of founding members of the organization, representatives of major
vendors, and scientists. It is partner of the major scientific event processing conference: Distributed
Event Based Systems (DEBS), the major scientific rules conference: International Web Rule Symposium
(RuleML) and also launched two Dagstuhl seminars on event processing (May 2007 and 2010). It has also
published an event processing glossary [17]. However, the EPTS is almost not-active nowadays.
Also, recent efforts, such as, the Real-time Business Insight Event Processing in Practice and Event
Processing Online Magazine, have stopped their activities.
As a result, each complex event processing engine uses its own terminology and semantics. We follow
the semantics presented in Etzion’s and Niblet’s book [7] and applied in PROTON. We describe below
some main terms used in our work for the sake of clarity.
Henceforth we briefly present main concepts and building blocks in our terminology. For further details
refer to [7].
5.1 Event types Generally speaking, an event is an occurrence within a particular system or domain; it is something that
has happened, or is contemplated as having happened in that domain ([7][23]). The word “event” is also
used to mean a programming entity that represents such an occurrence in a computing system. In the
latter definition, an event is an object of an event type. Events are actual instances of the event types
and have specific values. For example, the event "today at 10 PM a customer named John Doe made a
new deposit to his bank account” is an instance of the Transaction event type. An event type specifies
the information that is contained in its event instances by defining a set of attributes. The event
attributes are grouped into the header or metadata (e.g., the occurrence time of the event instance) and
the body or payload (specific information about the event, e.g., customer name).
We relate to the following event types:
A raw event is an event that is introduced into an event processing system by an event producer (an
entity at the edge of an event processing system that introduces events to the system). An example of a
raw event is a Cash deposit into a bank account.
A derived event is an event that is generated as a result of event processing that takes place inside the
event processing system. An example is that a Large cash deposit has been made into a bank account.
16
D4.1 Requirements and state of the art overview on flexible event processing
A situation is a derived event that is emitted outside the event processing system and consumed by at
least one consumer (an entity at the edge of an event processing system that receives events from the
system). An example is a Suspicious bank account.
5.2 Event attributes Every event instance has a set of built-in attributes (metadata) and a set of payload attributes. PROTON
employs the following attributes in the event type's metadata:
Name – of the event type.
OccurenceTime – a timestamp attribute, which we expect the event source to fill in as the
occurrence time of the event. If left empty, this equals the detectionTime attribute value.
DetectionTime – a timestamp attribute that records the time the CEP engine detected the event.
The time is measured in milliseconds, specifying the time difference between the current
machine time at the moment of event detection and midnight, January 1, 1970 UTC.
EventId – a unique string identification of the event, which can be set by the event source to
identify the event instance.
EventSource – holds the source of the event (usually the name of event producer).
The above built-in attributes can be used in an expression in the same manner as user-defined attributes.
User defined attributes can be added to the event type by specifying their names and object types. If the
attribute is an array, its dimension should be specified.
5.3 Context Context is a named specification of conditions that groups event instances so they can be processed in a
related way. While there exist several context dimensions, in this report we employ the two most
commonly used dimensions (in the future we might enlarge the set of context types, depending on the
scenarios requirements): temporal and segmentation-oriented. A temporal context consists of one or
more time intervals, possibly overlapping. Each time interval corresponds to a context partition, which
contains events that occur during that interval. A segmentation-oriented context is used to group event
instances into context partitions based on the value of an attribute or collection of attributes in the
instances themselves. As a simple example, consider a single stream of input events, in which each
event contains a customer identifier attribute. The value of this attribute can be used to group events so
there is a separate context partition for each customer. Each context partition contains only events
related to that customer, so the behaviour of each customer can be tracked independently of the other
customers. A composite context is a context that is composed from two or more contexts, known as its
members. The set of context partitions for the composite context is the Cartesian product of the
partition sets of the member contexts
17
D4.1 Requirements and state of the art overview on flexible event processing
5.4 Event Processing Network (EPN) An Event Processing Network (EPN) is a conceptual model, describing the event processing flow
execution. An EPN comprises a collection of Event processing Agents (EPAs), event producers, events
and consumers Figure 1. The network describes the flow of events originating at event producers and
flowing through various event processing agents to eventually reach event consumers. For example, in
Figure 1, events from Producer 1 are processed by Agent 1. Events derived by Agent 1 are of interest to
Consumer 1 but are also processed by Agent 3 together with events derived from Agent 2. Note that the
intermediary processing between producers and consumers in every installation is made up of several
functions and often the same function is applied to different events for different purposes at different
stages of the processing.
Figure 1: Illustration of an event processing network
The application definitions, i.e. the EPN, are written by the application developer during the build-time.
In PROTON, the definitions output in JSON (JavaScript Object Notation) format, is provided as
configuration to the CEP run-time engine.
5.5 Event Processing Agent (EPA) An Event Processing Agent (EPA) is a component that, given a set of input/incoming events within a
context, applies some logic for generating a set of output/derived events. An EPA can apply different
event patterns to detect specific relations among the input events.
An EPA performs three logical steps, a.k.a pattern matching process or event recognition (see Figure 2).
Please note that all three steps are optional but at least one must be done inside an EPA.
The filtering step, in which relevant events from the input events are selected for processing
according to the filter conditions. The output of this step is a set of participant events.
The matching step that takes all events that passed the filtering and looks for matches between
these events, using an event processing pattern or some other kind of matching criterion. The
output of this step is the matching set.
The derivation step that takes the output from the matching step and uses it to derive the
output events by applying derivation formulae.
Event Producer 1
Event Producer 2
Event Consumer 1
Event Consumer 2
EPA 1
EPA 3EPA 2
18
D4.1 Requirements and state of the art overview on flexible event processing
Figure 2: Event recognition process in an EPA
An event pattern is a template specifying one or more combinations of events. Given any collection of
events, if it’s possible to find one or more subsets of those events that match a particular pattern, it can
be said that such a subset satisfies the pattern. Some common examples of patterns are:
Filter, means that each event is evaluated against an expression and the event is filtered-in only
if it meets the expression conditions, and otherwise is filtered-out.
Sequence, means that at least one instance of all participating event types must arrive in a
specified order for the pattern to be matched.
Count, means that the number of instances in the participant event set satisfies the pattern’s
number assertion.
All, means that at least one instance of all participating event types must arrive for the pattern
to be matched; the arrival order in this case is immaterial.
Trend, events need to satisfy a specific change (increasing or decreasing) over time of some
observed value; this refers to the value of a specific attribute or attributes.
Absence, a specified event(s) must not occur within a predefined time window. The matching
set in this case is empty.
SUM, means that the value of a specific attribute, summed up over all participant events,
satisfies the sum threshold assertion.
5.6 Pattern policies A pattern policy is a named parameter that disambiguates the semantics of the pattern and the pattern
matching process. Pattern policies fine-tune the way the pattern detection process works. PROTON
supports five types of policies:
Event Processing Agent
Incoming/input events
Derived/output events
within context
filtering
matching
deriving
participant events
matching set
19
D4.1 Requirements and state of the art overview on flexible event processing
Evaluation policy – when the matching sets are produced? The EPA can either generate output
incrementally (in this case the evaluation policy is called Immediate) or at the end of the temporal
context (called Deferred).
Cardinality policy – how many matching sets are produced within a single context partition? Cardinality
policy helps limiting the number of matching sets generated, and thus the number of derived events
produced. The policy type can be single, meaning only one matching set is generated; or unrestricted,
meaning there are no restrictions on the number of matching sets generated.
Repeated/Instance Selection type policy – what happens if the matching step encounters multiple
events of the same type? The override repeated policy means that whenever a new event instance is
encountered and the participant set already contains the required number of instances of that type, the
new instance replaces the oldest previous instance of that type. The every repeated policy means that
every instance is kept, meaning all possible matching sets can be produced. First means that every
instance is kept, but only the earliest instance of each type is used for matching. Last is the same as first,
but the latest instance of each type is used for matching.
Consumption policy – what happens to a particular event after it has been included in the matching set?
Possible consumption policies are consume, meaning each event instance can be used in only one
matching set; and reuse, meaning an event instance can participate in an unrestricted number of
matching sets.
Policy relevance can be dictated by the event pattern. For example, the evaluation policy for an absence
pattern is always deferred (as we are testing the existence of an event instance for a specified temporal
context). Also, not all possible policies combinations are meaningful. For example, the choice of
consumption policy is irrelevant if the cardinality policy is single, because that means that the matching
step runs only once.
5.7 Context initiator policies A temporal context starts with an initiator and ends with a terminator. An initiator can be an event,
system startup, or absolute time. A terminator ends the temporal context. The terminator can be an
event, relative expiration time, an absolute expiration time, or “never ends”, i.e. the temporal context
remains open until engine shutdown.
A context initiator policy tunes up the semantics for temporal contexts in which the context initiator is
determined by an event. A context initiator policy defines the behaviours required when a window has
been opened and a subsequent initiator event is detected. The options are: add, a new window is
opened alongside the existing one; or ignore, the original window is preserved.
20
D4.1 Requirements and state of the art overview on flexible event processing
5.8 PROTON definitions In PROTON, the JSON CEP application definitions file can be created in three ways:
1. Build-time user interface – By this, the application developer creates the building blocks of the
application definitions. This is done by filling up forms without the need to write any code. The
file that is generated is exported in a JSON format to the CEP run-time engine.
2. Programming – The JSON definitions file can alternatively be generated programmatically by an
external application and fed into the CEP run-time engine.
3. Manually – The JSON file is created manually and fed into the CEP run-time engine.
The created JSON file comprises the following definitions:
Event types – the events that are expected to be received as input or to be generated as derived
events. An event type definition includes the event name and a list of its attributes.
Producers – the event sources and the way PROTON gets events from those sources.
Consumers – the event consumers and the way they get derived events from PROTON.
Temporal contexts – time window contexts in which event processing agents are active.
Segmentation contexts – semantic contexts that are used to group several events to be used by
the EPAs.
Composite contexts – grouping together several different contexts.
Event processing agents – patterns of incoming events in specific context that detect situations
and generate derived events. An EPA includes most of the following general characteristics:
o Unique name
o EPA type (operator). For each operator, different sets of properties and operands are
applicable.
o Context
o Other properties such as condition
o Participating events
o Segmentation contexts
o Derived events
The JSON file that is created at build-time contains all EPN definitions, including definitions for event
types, EPAs, contexts, producers, and consumers. At execution, the standalone run-time engine accesses
the metadata file, loads and parses all the definitions, creates a thread per each input and output
21
D4.1 Requirements and state of the art overview on flexible event processing
adapter and starts listening for events incoming from the input adapters (representing producers) and
forwards events to output adapters (representing consumers).
For the distributed implementation on top of STORM, an input Bolt serves the same function as input
adapter, and the derived events are passed as STORM tuples farther up in the chain of processing in
STORM (for the full integration details refer to D2.1).
6 Requirements for flexible event processing
In essence, in order for an event processing system to be flexible it has to fulfill two main requirements:
it can easily adapt to distributed scalable architectures and is easy enough so that not-IT experts can
define the event logic of an application. With regards to CEP, the FERARI project exactly addresses these
gaps.
FERARI envisaged architecture provides a distributed scalable platform in which PROTON is already
implemented. Refer to D2.1 – Architecture definition for details on the FERAI architecture. WP4 mainly
addresses the second requirement.
In this section we will briefly describe the main non-functional requirements of event processing
systems followed by a first cut of the mobile fraud use case event processing application design. We also
introduce The Event Model (TEM). In the summary of this section we address how we will tackle the
flexibility issue in the project.
6.1 Non-functional requirements of event processing applications The design of event processing applications consists of the design of the functional properties as well as
the nonfunctional properties. Non-functional requirements are concerned not with what a system does
but how well. It is often the non-functional properties that make or break a specific application [7]. In
the following subsections we briefly describe main aspects of non-functional requirements of event
driven systems. A survey of the state of the art in the area of non-functional requirements can be found
in [13].
The design of both functional and non-functional requirements is implementation specific and is either
done using current dedicated event processing tools by skilled IT developers that have good familiarity
with the event processing engine and the particular way to bypass the engine’s limitations, or in hand
coded fashion. As aforementioned, in both cases, it is rather complex and the actual design is not
accessible to the business users. With regards to non-functional requirements, the tuning is done
according to the capabilities of the tool, and often it is not possible to optimize for multiple goals, such
as trade-off between throughput and latency (see [14] for such optimization methods).
22
D4.1 Requirements and state of the art overview on flexible event processing
6.1.1 Scalability
Scalability is the capability of a system to adapt readily to a greater or lesser intensity of use, volume, or
demand while still meeting its business objectives. Scalability has several dimensions. The dimensions
relevant to event processing are the number of producers and consumers, number of input events,
number of event processing agent types, processing complexity, number of derived events, number of
concurrent runtime instances, number of concurrent runtime contexts [13] and [7].
In the event processing world there are two common approaches to scalability: scaling out and scaling
up. In the case of scaling out, or in other words, “horizontal scalability”, the approach is to add
additional logical units or nodes to increase processing power, while on the surface making them work
as a single unit. Example of such approaches is clustering of processing nodes and load balancing of
incoming stream of data between nodes. Scaling up, or “vertical scalability” means adding resources
within the same logical unit (node) to increase processing capacity, example of such is adding memory
to a physical node.
Not all applications can be scaled using the above techniques, but rather they need to satisfy some
constraints in order to be candidates for scale-up and scale-out, such as applications that can support
partitioning of state and load balancing.
Both approaches have tradeoffs. Scale-up approach has a simple management model and no network
communication overhead, however its growth potential is finite and there is no redundancy. On the
other hand, in scale-out approach we gain performance, redundancy, fault tolerance, but the cost of
such is increased management complexity, complex programming model, and communication
overheads between nodes which need to be taken into account.
Event processing applications uses load-shedding and load-balancing approaches to ensure the desired
performance using the limited resources provided to the application. For each application the options
should be examined carefully to determine what the appropriate solution is.
6.1.2 Availability
The availability of a system is the percentage of the time its users perceive it to be functioning. Event
processing systems can use existing standard high availability practices like logging, failover, and disaster
recovery practices. The designer of an event processing system must, however, make decisions related
to high availability. These considerations relate to whether it is cost effective to employ high availability
practices, as they have a cost associated with them and they may not be fully required in some
applications. An example of such a consideration is the issue of recoverability, that is, is the ability to
restore the state of a system to its exact value before a failure occurred.
Some event processing agents (such as those that perform aggregation, composition, and pattern
detection) are stateful, that is, the internal state of such an agent has to be kept as long as the particular
EPA instance is active, meaning as long as its context partition is valid. For example, a sequence pattern
detect EPA running with the reuse policy over a 24-hour window might need to retain all the participant
23
D4.1 Requirements and state of the art overview on flexible event processing
events that occurred during that period. In some applications recoverability is a must. If the event
processing is part of a mission-critical application, and decisions are made using the results of this
processing, losing some of the system’s state may have critical implications.
However there are also event processing applications where high availability is not required,
applications where events are symptoms of some underlying problem, which will occur again even if an
event is lost, or systems looking for statistical trends based on sampling. In such applications the cost of
applying high availability solution might well be too high based on the benefits which can be ripped from
such a solution.
6.1.3 Security
Security requirements relate both to ensuring that operations are only performed by authorized parties,
and that privacy considerations are met. Specifically this means the following functions:
Ensuring only authorized parties are allowed to be event producers or event consumers.
Ensuring that incoming events are filtered so that authorized producers can’t introduce invalid
events or events that they are not entitled to publish.
Ensuring that consumers only receive information to which they are entitled. In some cases a
consumer might be entitled to see some of the attributes of an event but not others.
Ensuring that unauthorized parties can’t add new event processing agents to the system, or
make modifications to the EPN itself (in systems where dynamic EPN modification is supported).
Keeping auditable logs of events received and processed, or other activities performed by the
system.
Ensuring that all databases and data communications links used by the system are secure.
6.1.4 Performance objectives
Some non-functional requirements can be translated to performance objectives which can then be the
subject of various optimization approaches. Some of the major performance objectives for event
processing are related to throughput, latency, and time-constraint objectives.
All these objectives are intended to address scaling issues, but each addresses them using different
assumptions and may be served by different optimizations. In addition, each objective may apply to an
entire system, or to any part of a system. In some systems there is a single performance objective for all
the processing in the system, for example, latency leveling for each event type in that system. In other
systems, there may be mix of performance objectives; some of the events may have real-time
constraints associated with them, whereas others may have another metric. Performance objectives
may also be composed of several separate metrics.
One of the major ways to achieve various performance metrics is parallel processing. There are three
levels of parallelism: first, parallelism inside a single core using multithreading; second, parallelism by
partitioning the work within a multicore machine where the threads running in different cores have
access to shared memory; and third, partitioning the work to multiple machines within a cluster.
24
D4.1 Requirements and state of the art overview on flexible event processing
An additional optimization method involves moving the processing close to the producers and
consumers where applicable. Consider an example where there are multiple sensors within the same
location, and the event processing involves aggregation of events that are emitted by these sensors.
Placing the aggregation EPA close to the sensors can eliminate a substantial amount of network traffic.
Likewise, if the EPN contains an EPA that creates many events that are all consumed by a certain
consumer, or a set of consumers that are located in a certain location, it might be useful to locate this
EPA close to the consumer or consumers. This optimization approach can also complement the parallel
processing approach. If the parallel event processing is executed over a grid of machines within various
geographic locations (instead of being on a physical cluster or co-located set of multicore machines) it
might be sensible to co-locate a group of agents if there’s a substantial amount of communication
between them.
In the research community several attempts were made to optimize the distribution and schedule of
event processing networks. In [15], a stratification algorithm is used to reveal dependencies among
functions in an event processing network and co-locate independent functions in layers or strata. This
allows for horizontal partitioning. The work in [15] then elaborates on a profiling-based technique for
event processing agent placements on execution nodes allowing for vertical partitioning. For example, if
a sequential pattern is segmented by an identifier as payload of the events, the execution could be
vertically partitioned by that identifier.
6.1.5 Usability
As already mentioned, there are no standards for event processing programming languages, although
there are various programming styles and approaches. In this section we look at two styles: the stream-
oriented style and the rule-oriented style [34], [7].
6.1.5.1 Stream-oriented programming style
The stream-oriented programming style is rooted in data flow programming. In essence a data flow
graph is a directed graph that consists of nodes and edges. The nodes represent processing elements,
and the edges represent data flowing between these nodes. The paradigm is one of continuous queries,
sometimes called operators that are constantly running in the nodes, while their results flow through
the edges in the data flow graph. The languages used to describe the queries are inspired by SQL and
relational algebra, though not all of them are based on SQL. When we are using a data flow graph for
event processing, the data flowing in the streams are event instances and have the appropriate event
semantics. These event instances are represented as records, and are often referred to as tuples
following the relational model’s terminology. A stream is a continuous flow of events, in most cases all
of the same event type, and are considered to be tuples of the same relation. The stream may be
unbounded and be active forever. This means that, unlike the conventional relational model where a
query is executed against an entire table of data, in the continuous query model a query can execute
only against a bounded subset of the stream. The stream is therefore broken up into a sequence of
windows and the query is performed successively against each window. This style is very common in
25
D4.1 Requirements and state of the art overview on flexible event processing
existing tools, e.g., InfoSphere Streams, Tibco Streambase, SAP ESP, Esper, Oracle Event processing, and
Microsoft Stream Insight.
6.1.5.2 Rule-oriented languages
The other dominant style of event processing languages is the style called rule-oriented. There are
several distinct types of rules: production rules, active (event-condition- action) rules, and rules based
on logic programming. We briefly present each of these styles below.
6.1.5.2.1 Production rules
Production rules are rules of the type “if –condition- then action”. They operate in a forward chaining
way: when the condition is satisfied, the action is performed. Production rules are rooted in expert
systems; the operational processing of production rules may be either declarative or procedural:
Declarative production rule execution is typically based on a variation of the Rete [35] algorithm
which matches facts against the patterns contained in the rules to determine which rule
conditions are satisfied. Information about the antecedents (conditions) of each rule is stored in
an internal state, and in every execution cycle changes to these states are evaluated.
Procedural production rule execution is based on sequential execution of compiled rules.
Production rules are based on state changes and not on events; however, some event processing
languages extend Rete-based production rules to support event processing. This is done by making
events an explicit part of the model, so that event occurrences can be used as part of the conditions for
invoking an inference rule. Thus the event processing is done through an inference process.
6.1.5.2.2 Active rules
Active rules, also known as event-condition-action (ECA) rules, are descended from work on active
databases. Active rules operate according to the following execution pattern: when an event occurs,
evaluate conditions and, if they are satisfied, trigger an action. The event may be primitive or composite.
The action can be one that derives an additional event, in which case an active rule maps directly onto
an EPA in our model. In cases where the action performs some external activity, such as invoking an
external service, the rule maps to the combination of an EPA and an event consumer. Example of tools
that apply the ECA style: Apama and RulePoint.
6.1.5.2.3 Logic programming rules
Logic programming is a programming style based on logical assertions. The most well-known example of
a logic programming language is Prolog. The application of the logic programming style to event
processing is rooted in the work done in the deductive database area. Commercial tools seldom apply
this kind of style, but still can be found in Tibco Business Event. Research projects are more common,
and can be found in the following languages: Etalis [42]; r-tec [37] and [43][43]; SAGE [38]; and t-rex [44].
26
D4.1 Requirements and state of the art overview on flexible event processing
6.1.5.3 Build-time interfaces
Event processing tools are composed of design (or build)-time and run-time components. The design
component serves for the definition of the event-driven application, while the run-time is the engine
that according to the event definitions, processes the events in real-time in order to detect and derive
the desired situations. We can identify four types of build-time interfaces [13]:
Text based programming languages (e.g., Apama)
Visual languages (e.g., StreamBase)
Form based languages (e.g., PROTON)
Natural languages (e.g., ODM Advanced29)
These types are not mutually exclusive, as development environments can consist of a mixture of
graphical and text oriented tools. The various environments reflect different assumptions about
developers’ preferences. In some cases developers prefer a more familiar text-based interface, whereas
others prefer a more visual style of development.
The task of defining the event definitions can be tedious and a hard task even for experts. In order to
alleviate this task, in some engines, the event definitions can be learnt in an automated way using
machine learning techniques. However, this aspect has received little attention so far. Some research
work on machine learning techniques to define the event patterns can be found in [36] and [37].
Most of the existing CEP engines have limitation on addition or modification of rules. Rules are
configured initially once and are not expected to change later. In other words, once the rules are
defined and configured the system freezes and rules cannot be added dynamically at run-time. However,
rules might change over time due to the dynamic nature of the application. In Esper [8] on Demand
Query/Rule facility provides ad-hoc execution of an EPL expression, but it has some limitations. Drools30
uses a polling mechanism to support dynamic rules/queries at runtime. However, this approach is not
very efficient as the system is not notified whenever there is a need to update the rule base; instead it
polls the resources again and again. The proposed research tool in [45] applies a push based or event
driven approach for incorporating the dynamism in CEP engines. It has been implemented to extend
Drools CEP engine. Note that in the scope of FERARI we intend to extend PROTON to cover some
functionality with regards to dynamic updates.
6.2 Requirements for the mobile fraud use case The use of the system will be shown in two application scenarios from telecommunication, where end
users will test the architecture for the two scenarios of mobile phone fraud detection and for cloud
health monitoring. Right now we focus only on mobile fraud.
29
https://www-01.ibm.com/support/knowledgecenter/SSQP76_8.7.0/com.ibm.odm.itoa.overview/topics/odm_itoa_overview.html?lang=en-us 30
http://docs.jboss.org/drools/release/6.2.0.CR3/drools-docs/pdf/drools-docs.pdf
27
D4.1 Requirements and state of the art overview on flexible event processing
6.2.1 Description of the mobile fraud use case
The overarching aim of the CEP component in this use case is to detect a potential mobile fraud incident.
To this end, a first EPN has been created with the collaboration of the use case owner with the goal of
having something meaningful and representative, yet doable to be achieved in the first year of the
project. The outcome is an EPN consisting of five EPAs shown in Figure 3 and detailed in the following
sections. For the sake of simplicity we only show the EPAs and the events flow in the network. The
PROTON JSON definitions file that comprises this EPN is currently being implemented.
In the current EPN we want to fire situations in the following cases (for detailed descriptions of each EPA
see Sections 6.2.2.1-6.2.2.5):
A long call to premium distance is made during night hours (EPA2, LongCallAtNight).
As before, but this time we are looking for at least three of these “long distance calls” per calling
number (EPA2, FrequentLongCallsAtNight).
Multiple long distance calls per calling number that cost more than a certain threshold value
(EPA3, FrequentLongCalls).
Same as before, but each occurrence cost exceeds the threshold (EPA4, FrequentEachLongCall)
We are looking for high usage of a line for long distance calls (EPA5, Expensivecall).
In the current process, potential fraud situations are (automatically) marked and inspected
afterwards by a human operator who decides whether it is a fraud or not. Therefore, the situations
described above and depicted in Figure 3 will be marked as potential indications of fraud incidents,
and will be checked up by humans afterwards.
Figure 3: Mobile fraud use case initial EPN
Note the following:
FrequentLongCallsAtNightEPA2
Cal
ls
EPA1
ExpensiveCalls
LongCallAtNight
Situ
atio
ns
EPA3FrequentLongCalls
EPA5
EPA4FrequentEachLongCall
28
D4.1 Requirements and state of the art overview on flexible event processing
Due to privacy issues, the values chosen for specific variables and thresholds selected are not
the correct ones. In reality, the EPN will be implemented applying the correct values. However,
this does not alter the logic of the rules, just the assignment of the different variables and
thresholds values.
“Premium location services” is a closed list of potential far locations/destinations for which the
rules are relevant. We have opted for “Maldives” as a code name for these locations. In practice,
the same pattern will be duplicated for each of the locations in this list.
In this use case night hours are considered between 19:00 and 7:00, and 24 hours are
considered from 24:00 to 23:59 the day after.
We are only are interested in outgoing calls (incoming calls are not relevant to fraud detection),
indicated whenever the call_direction field equals 1 (refer to Table 1).
6.2.1 Event types
Five event types have been defined so far that comprise the event inputs, outputs/derived, and
situations as shown in Table 1. For the sake of simplicity we only show the user-defined attributes or the
event payload and not the metadata (Section 5.2).
Although the names of concepts in the application can be determined freely by the application designer
in PROTON, we use some naming conventions for the sake of clarity. We denote event types with capital
letters. Built-in/metadata attributes start with a capital letter, as well as payload attributes that hold
operators values, while payload attributes start with a lower letter.
Note that the Call raw event includes more fields or attributes. We defined only the ones required for
pattern detection in the current EPN implementation. When running in FERARI architecture, PROTON
will ignore event attributes not specified in its JSON.
Table 1: Initial EPN for the mobile phone fraud use case
Event name Call
Payload object_id; billed_msisdn; call_start_date; calling_number; called_number; other_party_tel_number; call_direction; tap_related; conversation_duration; total_call_charge_amount
Event name LongCallAtNight
Payload calling_number; conversation_duration; other_party_tel_number
Event name FrequentLongCallsAtNight
Payload calling_number; other_party_tel_number; CallsCount
Event name FrequentLongCalls
Payload calling_number; other_party_tel_number; CallsCount; CallsLengthSum
Event name FrequentEachLongCall
Payload calling_number other_party_tel_number; CallsCount
Event name ExpensiveCalls
Payload calling_number; other_party_tel_number; CallsCostSum
29
D4.1 Requirements and state of the art overview on flexible event processing
6.2.2 Event processing agents
Henceforth, we describe the EPAs in the following order: Event name; motivation; event recognition
process (following Figure 2); contexts along with temporal context policy; and pattern policies.
In the event recognition process we only show the steps that take place, i.e. relevant, in the specific EPA,
while the others are greyed. For the filtering step we show the filtering expression; for the matching
step we denote the pattern variables; and for the derivation step we denote the values assignment and
calculations. Please note that for the sake of simplicity we only show the assignments that are not copy
of values (all other derived event attributes values are copied from the input events). For attributes, we
just denote their names without the prefix of ‘attribute_name.’
6.2.2.1 EPA1: LongCallAtNight
Motivation: Check for “long” calls (defined as more than 40 min) to premium locations during night
hours (limited from 19:00 to 7:00).
Event recognition process
Figure 4: Event recognition process for Filtering EPA
Note that Filter agents are used to eliminate uninteresting events. A Filter agent takes an incoming
event object and applies a test to decide whether to discard it or whether to pass it on for processing by
subsequent agents. The Filter agent test is therefore stateless, in other words, a test based solely on the
content of the event instance. Therefore, both pattern and context policies are not applicable with this
type of EPA.
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE UNRESTRICTED FIRST REUSE
other_party_tel_number = “Maildives” ANDcall_direction = 1 AND(call_start_date > 19:00 OR call_start_date < 7:00) ANDconversation_duration > 40 minutes
Event Processing Agent
Call
within context
filtering
deriving
LongCallAtNightmatching
30
D4.1 Requirements and state of the art overview on flexible event processing
Context
Segmentation: Not applicable.
Temporal window: ALWAYS
Initiator policy: IGNORE
Meaning: The temporal window will open with the first Call and will not close.
Figure 5: Context for Filter EPA
6.2.2.2 EPA2: FrequentLongCallsAtNight
Motivation: Same as before, but we are seeking for at least 3 calls made to premium locations during
night hours lasting longer than “40 minutes” per a calling number.
Event recognition process
Figure 6: Event recognition process for FrequentLongCallsAtNight EPA
Note that the pattern COUNT sums the number of the input event occurrences, while count is the
assertion value for the COUNT pattern. Also, that the input event for this EPA is LongCallAtNight event
which is derived from EPA1 (see Figure 3).
Call
Event Processing Agent
LongCallAtNight
within context
filtering
deriving
FrequentLongCallsAtNightCOUNT
count> 2
CallsCount : count
31
D4.1 Requirements and state of the art overview on flexible event processing
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE UNRESTRICTED FIRST REUSE
Context
Segmentation: by calling_number
Temporal window: DAILY (fixed non-overlapping interval)
Initiator: 24:00
Terminator: 23:59
Initiator policy: IGNORE
Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we
group calls made during one day. The filter step will assure that only calls made at night will be
considered in the counting. In Figure 7, the forth call does not pass the filter assertion, and therefore
there is no a derived event at this point (as per the policies used, we start firing derived events at each
time the pattern is satisfied).
Figure 7: Context for FrequentLongCallsAtNight EPA
6.2.2.3 EPA3: FrequentLongCalls
Motivation: We are interested in detecting a situation resulting from at least 10 calls made to a
premium location summing up at least 60 min length in a day.
Event recognition process
Call
23 : 59 24 : 00
FrequentLongCallsAtNight FrequentLongCallsAtNight
32
D4.1 Requirements and state of the art overview on flexible event processing
Figure 8: Event recognition process for FrequentLongCalls EPA
Note that the pattern SUM has two assertions, namely count (the number of occurrences to be satisfied)
and countSum (the value to be exceeded).
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE SINGLE FIRST REUSE
Context
Segmentation: by calling_number
Temporal window: DAILY (fixed non-overlapping windows)
Initiator: 24:00
Terminator: 23:59
Initiator policy: IGNORE
Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we
group calls made during one day. In Figure 9, only one derived events will be fired as the pattern is
satisfied on the 10th Call. Note that it might be that the pattern is also satisfied in following calls as well,
but according to the policy we only notify once as the pattern is detected.
(count>9 and countSum(conversation_duration) > 60 min)
Event Processing Agent
Call
within context
filtering
SUM
deriving
FrequentLongCalls
other_party_tel_number = “Maildives” ANDcall_direction = 1
CallsCount: countCallsLengthSum: countSum
33
D4.1 Requirements and state of the art overview on flexible event processing
Figure 9: Context for FrequentLongCalls EPA
6.2.2.4 EPA4: FrequentEachLongCall
Motivation: A variation of the previous pattern. In this case, we are interested in detecting a situation
resulting from at least 10 long (last at least 60 min each) calls made to a premium location in a day.
Event recognition process
Figure 10: Event recognition process for FrequentEachLongCall EPA
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE SINGLE FIRST REUSE
Context
Segmentation: by calling_number
Temporal window: DAILY (non-overlapping windows)
Initiator: 24:00
Terminator: 23:59
count>9
Event Processing Agent
Call
within context
filtering
COUNT
deriving
FrequentEachLongCall
other_party_tel_number = “Maildives” ANDcall_direction = 1 ANDconversation_duration > 60 minutes
CallsCount : count
Call
23 : 59 24 : 00
FrequentLongCalls
34
D4.1 Requirements and state of the art overview on flexible event processing
Initiator policy: IGNORE
Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we
group calls made during one day. In Figure 11, one derived event will be fired as the pattern is satisfied
on the 12th call.
Figure 11: Context for FrequentEachLongCall EPA
6.2.2.5 EPA5: ExpensiveCalls
Motivation: For every six hours, we notify in case calls dialed to premium locations sum up more than a
pre-defined cost (e.g. 100kn) per calling number.
Event recognition process
Figure 12: Event recognition process for ExpensiveCalls EPA
Pattern policies
Evaluation Cardinality Repeated Consumption
IMMEDIATE SINGLE FIRST REUSE
Context
Segmentation: by calling_number
Event Processing Agent
Call
within context
filtering
SUM
deriving
ExpensiveCalls
other_party_tel_number = “Maildives” ANDcall_direction = 1 AND
countSum(total_call_charge_amount) > 100 kn
CallsCostSum: countSum
Call
23 : 59 24 : 00
FrequentEachLongCalls
35
D4.1 Requirements and state of the art overview on flexible event processing
Temporal window: sliding/overlapping window
Initiator: first Call
Terminator: + 6 hours
Initiator policy: ADD
Meaning: The first window opens with the first Call event. This window closes after 6 hours. The second
event opens, again, a six hour window, and so forth. Figure 13 shows the different windows that appear
in different colors. As it can been seem, each event might correspond to more than window. The derived
event is emitted only once (the cardinality policy is SINGLE) when the pattern is detected (the evaluation
policy is IMMEDIATE).
Figure 13: Context for ExpensiveCall EPA
6.2.3 Mobile phone fraud use case functional requirements summary
The first EPN for the fraud detection use case (see Figure 3) includes five EPAs (types: FILTER, COUNT,
and SUM), one raw event, and five situations. Our design and implementation relies on PROTON’s
building blocks and capabilities and it might be possible that the same application will look differently
when implemented in another CEP engine that uses different building blocks. The implementation of
this EPN is currently work-in-process in FERARI’s architecture uses real-data that has been anonymized
due to privacy issues. Further refinements of this initial EPN will include more event rules.
6.3 Introduction to the event model The Event Model (TEM) provides a new way to model, develop, validate, maintain, and implement
event-driven applications. In TEM, the event derivation logic is expressed through a high-level
declarative language through a collection of normalized tables (spreadsheet like fashion). These tables
can be automatically validated and transformed into an EPN and eventually to a running application.
This idea has already been successfully proven in the domain of business rules by The Decision Model
(TDM) [16] . TDM groups the rules into natural logical groups to create a structure that makes the model
relatively simple to understand, communicate, and manage. TEM is based on a set of well-defined
principles and building blocks, and does not require substantial programming skills, therefore target to
non-technical people.
Call
+6 hours
ExpensiveCalls ExpensiveCalls
+6 hours +6 hours +6 hours
36
D4.1 Requirements and state of the art overview on flexible event processing
Current version of the model ([27][28], [29], [30]) covers part of the functional requirements of event-
driven applications. In the scope of the FERARI project we plan to extend today’s basic model to cover all
aspects of functional requirements as well as non-functional requirements which is still a missing piece
in the model. The resulting tables will be converted into an EPN which can thereafter be converted into
a JSON definition file and run in PROTON.
6.4 Summary of the requirements for flexible event processing in FERARI In this section we surveyed the main aspects of non-functional requirements from a (complex) event
processing system and the main functional requirements addressed in the scope of the mobile fraud use
case in the project.
In order to be flexible, event processing engines need to tackle the two following requirements in a
satisfactory way:
• The easy adaptability to non-functional requirements, specially, the way the tool copes with
scalability issues in a distributed environment.
• The ease definition and maintenance of the event-driven logic.
Regarding the first requirement, in FERARI, the proposed architecture is a scalable distributed
environment that combines event processing capabilities (PROTON) on top of a streaming platform
(Storm). Regarding the second requirement, we propose to develop TEM, which enables the definition
and maintenance of event-driven applications by non-technical people.
7 Summary and future steps Our goal in FERARI is to bring event processing much closer to the business world by extending simple
stream processing of numeric or textual data to the much more powerful realm of complex event
processing in a way that is both consumable to business users and as a seamless part of Big Data
applications.
CEP has already built up significant momentum manifested in a steady research community and variety
of commercial as well as open source products. Capitalizing on those works, our approach is to provide a
model to construct event processing applications by using a goal-driven declarative approach to define
the requirements for event processing applications and generate implementable complete designs out
of these requirements. The requirements will include both functional requirements such as: event
filtering, event aggregations, and event patterns, as well as non-functional requirements such as:
scalability and fault-tolerance.
By applying TEM, flexibility is achieved using an implementation independent meta-model based on
table representation that can be presented in a spreadsheet like fashion and a set of diagrams that are
both easily consumable by business users who are used to work with spreadsheets, and expressive
37
D4.1 Requirements and state of the art overview on flexible event processing
enough so it can directly generate code. Note that the control of both the functional and non-functional
specification over all the life cycle will stay at the hands of the business users – the generated code will
be untouchable.
During the second year of the project we plan to extend current TEM tables and diagrams to cope with
FERARI’s requirements in the mobile fraud detection use case along with the implementation of the
event processing network for the use cases presented in this report in PROTON on Storm.
38
D4.1 Requirements and state of the art overview on flexible event processing
8 References
[1]. Altman R., Schulte W. R., Natis Y. V., Pezzini M., Driver M., Blanton C. E., Wilson N., and Van
Huizen G. 2014. Agenda Overview for Application Architecture. Gartner report G00261571.
Published: 10 January 2014.
[2]. Linden A. 2104. Hype Cycle for Advanced Analytics and Data Science. Gartner report
G00262076. Published: 30 July 2014.
[3]. LeHong H., Fenn J., and Toit R. L-du. 2014. Hype Cycle for Emerging Technologies. Gartner
report G00264126. Published: 28 July 2014.
[4]. Steenstrup K. 2014. Hype Cycle for Operational Technology. Gartner report G00263170.
Published: 23 July 2014.
[5]. LeHong H. and Velosa A. 2014. Hype Cycle for the Internet of Things. Gartner report G00264127.
Published: 21 July 2014.
[6]. Cugola G. and Margara A. 2012. Processing Flows of information: From Data Stream to Complex
Event Processing. ACM Comput. Surv., 44(3), 2012.
[7]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company.
[8]. Esper reference document. [Online]. At: http://esper.codehaus.org/esper-
4.10.0/doc/reference/en-US/html/index.html.
[9]. Mendes M. R., Bizarro P., and Marques P. 2009. A performance study of event processing
systems. In Performance Evaluation and Benchmarking, 221-236. Springer.
[10]. Alevizos E. and Artikis A. 2014. Being Logical or Going with the Flow? A Comparison of Complex
Event Processing Systems. 8th Hellenic Conference on Artificial Intelligence.
[11]. Etzion O. 2010. Temporal Aspects of event processing. Handbook of distributed event based
system.
[12]. Adi A. and Etzion O. 2004. AMIT – The situation manager. VLDB J. 13 (2), 177-203.
[13]. Etzion O., Rabinovich E., and Skarbovsky I. 2011. Non-functional properties of event processing.
In Proceedings of the Fifth ACM International Conference on Distributed Event-Based Systems
(DEBS 2011), 365-366.
[14]. Rabinovich E., Etzion O., and Gal A. 2011. Pattern rewriting framework for event processing
optimization. In Proceedings of the Fifth ACM International Conference on Distributed Event-
Based Systems (DEBS 2011), 101-112.
[15]. Lakshmanan G., Rabinovich Y., and Etzion O. 2009. A stratified approach for supporting high
throughput event processing application. In Proceedings of the Fifth ACM International
Conference on Distributed Event-Based Systems (DEBS 2009).
[16]. von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press.
[17]. Luckham D. and Schulte R. 2011. EPTS Event Processing Glossary v2.0. Technical report.
[Online]. At: http://www.complexevents.com/2011/08/31/epts-event-processing-glossary-
updated-to-version-2-0/.
39
D4.1 Requirements and state of the art overview on flexible event processing
[18]. Fülöp L. J., Tóth G., Rácz R., Pánczél J., Gergely T., Beszédes A., and Farkas L. 2010. Survey on
complex event processing and predictive analytics. In Proceedings of the Fifth Balkan
Conference in Informatics, 26-31.
[19]. Gualtieri M. and Curran R. 2014. The Forrester Wave™: Big Data Streaming Analytics Platforms.
Q3 2014, July 17. [Online]. At: http://forms2.tibco.com/rs/tibcoinfra/images/
Forrester %20Wave %20Big%20Data% 20Streaming%20Analytics%207.17.14.pdf.
[20]. Shulte R. 2014. An Overview of Event Processing Software (August, 25, 2014), [Online]. At:
http://www.complexevents.com/
[21]. Vincent P., 2014. CEP tooling market survey. December 3, 2014, [Online]. At:
http://www.complexevents.com/2014/12/03/cep-tooling-market-survey-2014/
[22]. Babcock B., Babu S., Datar M., Motwani R., and Widom J. 2002. Models and issues in data
stream systems. In Proceedings of the 21st ACM SIGMOD/PODS Symposium on Principles of
Database Systems (PODS’02). ACM, New York, NY, 1–16.
[23]. Luckham D. The power of events: an introduction to complex event processing in distributed
enterprise systems. 2001. Addison-Wesley Longman Publishing Co., Inc.
[24]. Proton user guide and programmer guide [Online]. At:
https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_IBM_Proactive_Te
chnologyOnline_User_and_Programmer_Guide
[25]. Open specification (REST api) [Online]. At: http://forge.fi-
ware.org/plugins/mediawiki/wiki/fiware/index.php/Complex_Event_Processing_Open_RESTful
_API_Specification
[26]. Installation and administration guide. [Online]. At: https://forge.fi-
ware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_-
_IBM_Proactive_Technology_Online_Installation_and_Administration_Guide
[27]. Etzion O. and von Halle B.: 2013. The Event Model. [Online]. At:
http://www.slideshare.net/opher.etzion/er-2013-tutorial-modeling-the-event-driven-world.
[28]. Fournier F. and Limonad L., 2014. The BE2 model: When Business Events meet Business Entities.
DAB14 Workshop.
[29]. von Halle B. and Fournier F. 2014. Introducing the Next Horizon: The TEM Model (Part 1 - A
Paradigm for Processing Complex Events in Real Time)
http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3036/Introducing-the-Next-
Horizon-The-Event-Model-TEM.aspx
[30]. Fournier F. and von Halle B. 2014. Introducing the Next Horizon: The Event Model (Part 2 –
The Event Processing in Action)
http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3059/The-Event-Model-
TEM-in-Action.aspx
[31]. Etzion O. and Adkins J.M. 2013. 2013. Tutorial: Why is event-driven thinking different from
traditional thinking about computing? In Proceedings of the Seventh ACM International
Conference on Distributed Event-Based Systems (DEBS 2013), 269–270.
40
D4.1 Requirements and state of the art overview on flexible event processing
[32]. Schulte W.R. and Luckham D. 2013. Introduction to Real-Time Intelligence. [Online]. At:
http://www.complexevents.com/2013/09/17/understanding-real-time-intelligence/,
September 2013.
[33]. Shulte W.R. 2012. Does anyone care about event processing?. [Online]. At:
http://www.complexevents.com/2012/07/25/does-anyone-care-about-event-processing/, July
2012.
[34]. Bry F., Eckert M., Etzion O., Pashchke A., and Riecke J. 2009. Event processing Language Tutorial,
[Online]. At: http://www.slideshare.net/opher.etzion/debs2009-event-processing-languages-
tutorial
[35]. Forgy C. 1982. Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem.
Artificial Intelligence 19(1), 17-37.
[36]. Margara A., Cugola G., and Tamburrelli G. 2014. Learning from the past: automated rule
generation for complex event processing. In Proceedings of the 8th ACM International
Conference on Distributed Event-Based Systems (DEBS2014), 47-58.
[37]. Artikis A., Sergot M., and Paliouras G. 2014. An Event Calculus for Event Recognition. IEEE
Transactions on Knowledge and Data Engineering (TKDE).
[38]. Broda K. Clark, R. M. and Russo A. 2009. Sage: A logical agent-based environment monitoring
and control system. In AmI, 112-117.
[39]. Schultz-Moller N. P., Migliavacca M., and Pietzuch P. 2009. Distributed complex event
processing with query rewriting. Proceedings of the Third ACM International Conference on
Distributed Event-Based Systems (DEBS2009), 1-12.
[40]. Demers A. J., Gehrke J., Hong M., Riedewald M., and White W. M. 2006. Towards expressive
publish/subscribe systems. Intl Conference on Extending Database Technology (EDBT), 627-644.
[41]. Wang F. and Liu P. 2005. Temporal management of rfd data. In Proceedings of the 31st VLDB
Conference, 1128-1139.
[42]. Anicic D., Fodor P., Rudolph S., Stuhmer R., Stojanovic N., and Studer R. 2011. Etalis: Rule-based
reasoning in event processing. Reasoning in Event-Based Distributed Systems, 99-124.
[43]. Artikis A., Paliouras G., Portet F., and Skarlatidis A. 2010. Logic-based representation, reasoning
and machine learning for event recognition. Proceedings of the Forth ACM International
Conference on Distributed Event-Based Systems (DEBS2010), 282-293.
[44]. Cugola G. and Margara A. 2012. Complex event processing with t-rex. Journal of Systems and
Software, 85(8), 1709-1728.
[45]. Bhargavi R, Ravi Pathak, and Vaidehi V. 2013. Dynamic Complex Event Processing – Adaptive
Rule Engine. International Conference on Recent Trends in Information Technology (ICRTIT),
189-194.
[46]. Biscotti F., Schulte W.R., Iijima K., and Heudecker N.. 2014. Market Guide for Event Stream
Processing. Gartner report G00263080. Published: 14 August 2014.
[47]. Anicic D., Rudolph S., Fodor P., and Stojanovic N. 2012. Real-Time Complex Event Recognition
and Reasoning – A Logic Programming Approach. Applied Artificial Intelligence, Volume 26
Special Issue on Event Recognition (January 2012).