spchains: a declarative framework for data stream processing in pervasive applications

17
spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications Dario Bonino, Fulvio Corno Politecnico di Torino Dip. Automatica e Informatica Torino, Italy http://elite.polito.it The 3rd International Conference on Ambient Systems, Networks and Technologies August 27-29, 2012, Niagara Falls, Ontario, Canada

Upload: fulvio-corno

Post on 08-May-2015

780 views

Category:

Technology


2 download

DESCRIPTION

Presentation given at the 3rd International Conference on Ambient Systems, Networks and Technologies August 27-29, 2012, Niagara Falls, Ontario, Canada. The paper is available on the PORTO open access repository: http://porto.polito.it/2496720/

TRANSCRIPT

Page 1: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

spChains:

A Declarative Framework for Data Stream

Processing in Pervasive Applications

Dario Bonino, Fulvio Corno

Politecnico di Torino Dip. Automatica e Informatica

Torino, Italy

http://elite.polito.it

The 3rd International Conference on

Ambient Systems, Networks and Technologies

August 27-29, 2012, Niagara Falls, Ontario, Canada

Page 2: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Goals

Enable real-time ambient & sensor data processing

Allow AmI designers to easily specify required

computations

Provide an extensible open source processing library

spChains ANT’2012, Niagara Falls, Canada 2

Page 3: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Outline

spChains ANT’2012, Niagara Falls, Canada 3

Motivation and Background

Stream processing

spChains Framework

Use cases

Conclusions

Page 4: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Motivation

Ambient Intelligence Systems

100’s or 1,000’s of sensors

Different physical quantities (ºC, %H2O, kW, kWh, …)

Sampling frequencies from seconds to minutes

Huge stream of data being generated

Storage and retrieval

On-line processing

Off-line processing

Analytics

spChains ANT’2012, Niagara Falls, Canada 4

Page 5: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

On-line processing: Applications

Data Decimation (from kHz to mHz)

Aggregation (over time, over space, over sensor types)

Averaging

Feeding User Displays and Dashboards

Computing up-to-date and user-meaningful information

Monitoring and Alerting

Checking Thresholds

Generating Alert messages

Virtual Sensors

Computing derivative quantities

spChains ANT’2012, Niagara Falls, Canada 5

Page 6: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Requirements

Input: up to 10,000-100,000 events/second

Data: real-valued quantities, explicit units of measure

Output: real-valued or Boolean, often at much lower

frequency

Computation: custom-defined depending on the

application requirements

Operators: reusable standard temporal operations

applicable to data streams

Usability: should not require database expert to define

computations, domain experts must be autonomous

spChains ANT’2012, Niagara Falls, Canada 6

Page 7: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Technology scouting

Standard Relational DBMS

Good for storage

Not efficient for

computations

Rely on central servers

NoSQL approaches

Great for storage

May do computations,

require custom

programming and expertise

Rely on central (or cloud)

servers

Custom programming

Perfect fit with application

requirements

Very expensive to

customize

Stream Processing

No storage

Excellent for computations

Requires custom expertise

spChains ANT’2012, Niagara Falls, Canada 7

Page 8: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Stream Processing

(or Complex Event Processing, CEP)

Event processing: tracking and analyzing streams of data

«events», and deriving a conclusion from them

Defines a set of (fixed) queries

Event streams are analyzed in real time (often with in-

memory processing) according to the programmed queries

Guarantees fast and scalable processing

Increasingly adopted in different domains: Business Process

Management, Recommender Systems, Financial Services, Time

Series, …

Several tools available (commercial and open source)

Specific skills needed to write efficient queries, in tool-

dependent languages

spChains ANT’2012, Niagara Falls, Canada 8

Page 9: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Stream Processing

(or Complex Event Processing, CEP)

Event processing: tracking and analyzing streams of data

«events», and deriving a conclusion from them

Defines a set of (fixed) queries

Event streams are analyzed in real time (often with in-

memory processing) according to the programmed queries

Guarantees fast and scalable processing

Increasingly adopted in different domains: Business Process

Management, Recommender Systems, Financial Services, Time

Series, …

Several tools available (commercial and open source)

Specific skills needed to write efficient queries, in tool-

dependent languages

spChains ANT’2012, Niagara Falls, Canada 9

insert into RealEvent(src, streamName, value, unitOfMeasure) select ‘‘Average’’, ‘‘Average-out’’, avg(value) as value, unitOfMeasure from realEvent (streamName=’’M1’’). win:time\_batch(‘‘1h’’) group by src, streamName, unitOfMeasure; insert into BooleanEvent(src, streamName, booleanValue) select ‘‘Threshold’’, ‘‘Threshold-out’’ as streamName, true as value from pattern [every (oldSample=RealEvent( streamName=‘‘Average-out’’, MeasureEventComparator.compareToMeasure(oldSample,‘‘1kW’’, EventComparisonEnum.LESS_THAN_OR_EQUAL)) -> newSample=RealEvent(streamName=oldSample.streamName, MeasureEventComparator.compareToMeasure(newSample,‘‘1kW’’, EventComparisonEnum.GREATER_THAN)))].win:length(2);

Page 10: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Proposed approach (1)

Stream Processing for event data processing in real time

(Extensible) Library of predefined operators (spBlocks)

Declarative framework (spChains) to express the

required computations

Each Computation = Stream Processing Chain

Chain = Sequence of Stream Processing Blocks

Block = predefined operator, configured with parameters

spChains ANT’2012, Niagara Falls, Canada 10

Page 11: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Proposed approach (2)

The set of spChains is described as a simple XML file

All chains are automatically mapped to Stream

Processing queries

spChains ANT’2012, Niagara Falls, Canada 11

insert into RealEvent(src, streamName, value, unitOfMeasure) select ‘‘Average’’, ‘‘Average-out’’, avg(value) as value, unitOfMeasure from realEvent (streamName=’’M1’’). win:time\_batch(‘‘1h’’) group by src, streamName, unitOfMeasure; insert into BooleanEvent(src, streamName, booleanValue) select ‘‘Threshold’’, ‘‘Threshold-out’’ as streamName, true as value from pattern [every (oldSample=RealEvent( streamName=‘‘Average-out’’, MeasureEventComparator.compareToMeasure(oldSample,‘‘1kW’’, EventComparisonEnum.LESS_THAN_OR_EQUAL)) -> newSample=RealEvent(streamName=oldSample.streamName, MeasureEventComparator.compareToMeasure(newSample,‘‘1kW’’, EventComparisonEnum.GREATER_THAN)))].win:length(2);

<spXML:blocks> <spXML:block id="Avg1“ function="AVERAGE"> <spXML:param name="window" value="1“ unitOfMeasure="h"/> <spXML:param name="mode“ value="batch"/> </spXML:block> <spXML:block id="Th1“ function="THRESHOLD"> <spXML:param name="threshold“ value="1" unitOfMeasure="kW"/> </spXML:block> </spXML:blocks>

Page 12: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

spChains Framework

spChains ANT’2012, Niagara Falls, Canada 12

Stream Processing

Chains

Stream Processing

Block

Event D

rains

Event So

urce

s

spBlocks

Pervasive/Ubiquitous Communication Infrastructure

Aggregate / Computed Measures

Pattern Match / Alerts

Environmental Data

Pervasive application

(s) Final Users

Chain Definition

Page 13: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Basic spBlock Library

spChains ANT’2012, Niagara Falls, Canada 13

Page 14: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Examples of spChains

spChains ANT’2012, Niagara Falls, Canada 14

Page 15: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Examples of spChains

spChains ANT’2012, Niagara Falls, Canada 15

<spXML:blockid = "Avg1" function = "AVERAGE"> <spXML:param name = "window" value = "1" unitOfMeasure = "h" / > <spXML:param name = "mode" value = "batch" /> </spXML:block>

Page 16: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Implementation

Java spChains library (Apache v2.0 license)

Core library

Esper bindings

Basic spBlock library

Scales up to 200 k events/sec

Already in use

3 different data centers, running on embedded PCs

Monitoring environment, electrical power consumption,

thermal flows (heating and cooling), polled by means of the

Dog2.x multiprotocol gateway

Computed quantity are “pushed” to Web Service collectors

Over 3 months of uptime, no issues found

spChains ANT’2012, Niagara Falls, Canada 16

http://elite.polito.it/spchains

Page 17: spChains: A Declarative Framework for Data Stream Processing in Pervasive Applications

Conclusions

Complex computations in

the field and in real time

Efficient and easy to

integrate

Lowered the barrier to

adoption of Stream

Processing

Future work

User interface

Large-scale installations

spChains ANT’2012, Niagara Falls, Canada 17

http://elite.polito.it

http://elite.polito.it/spchains

[email protected]

[email protected]