discover data that matters- deep dive into wso2 analytics

59
Discover Data That Matters: Deep Dive into WSO2 Analytics Sriskandarajah Suhothayan Associate Director/Architect, WSO2 Anjana Fernando Associate Director/Architect, WSO2

Upload: suhothayan

Post on 20-Mar-2017

25 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Discover Data That Matters- Deep dive into WSO2 Analytics

Discover Data That Matters: Deep Dive into WSO2 Analytics

Sriskandarajah Suhothayan

Associate Director/Architect, WSO2

Anjana Fernando

Associate Director/Architect, WSO2

Page 2: Discover Data That Matters- Deep dive into WSO2 Analytics

Smart Analytics

Creating realtime, intelligent, actionable business insights,

and data products

Page 3: Discover Data That Matters- Deep dive into WSO2 Analytics

WSO2 Data Analytics Server

Realtime Incremental Intelligent

Page 4: Discover Data That Matters- Deep dive into WSO2 Analytics

WSO2 DAS Architecture

Page 5: Discover Data That Matters- Deep dive into WSO2 Analytics

WSO2 DAS Architecture

Page 6: Discover Data That Matters- Deep dive into WSO2 Analytics

Data Processing Pipeline

Collect Data

Define scheme for data

Receive Events

Analyze

Realtime analytics with Siddhi

incremental and batch analytics with Spark

SQL

Intelligence with Machine Learning

Communicate

Alerts

Dashboards

Interactive Queries

API

Page 7: Discover Data That Matters- Deep dive into WSO2 Analytics

Market Recognition

• Named as a Strong Performer in The Forrester Wave™: Big Data Streaming Analytics, Q1 2016.

• Highest score possible in 'Acquisition and Pricing' criteria, and among second-highest scores in 'Ability to execute' criteria

• The Forrester Report notes…..

“WSO2 is an open source middleware provider that includes a full spectrum of architected-as-one components such as application servers, message brokers, enterprise service

bus, and many others.

Its streaming analytics solution follows the complex event processor architectural approach, so it provides very low-latency analytics. Enterprises that already use WSO2 middleware can add CEP seamlessly. Enterprises looking for a full middleware stack that includes streaming analytics will

find a place for WSO2 on their shortlist as well.”

Page 8: Discover Data That Matters- Deep dive into WSO2 Analytics

a

Experian delivers a digital marketing platform, where CEP plays a key role to analyze in real-time customers behavior and offer targeted promotions. CEP was chosen after careful analysis, primarily for its openness, its open source nature, the fact support is driven by engineers and the availability of a complete middleware, integrated with CEP, for additional use cases.

Eurecat is the Catalunya innovation center (in Spain) - Using CEP to analyze data from iBeacons deployed within department stores to offer instant rebates to user or send them help if it detected that they seem “stuck” in the shop area. They chose WSO2 due to real time processing, the variety of IoT connectors available as well as the extensible framework and the rich configuration language. They also use WSO2 ESB in conjunction with WSO2 CEP.

Pacific Controls is an innovative company delivering an IoT platform of platforms: Galaxy 2021. The platform allows to manage all kinds of devices within a building and take automated decisions such as moving an elevator or starting the air conditioning based on certain conditions. Within Galaxy2021, CEP is used for monitoring alarms and specific conditions.Pacific Controls also uses other products from the WSO2 platform, such as WSO2 ESB and Identity..

A leading airline uses CEP to enhance customer experience by calculating the average time to reach their boarding gate (going through security, walking, etc.). They also want to track the time it takes to clean a plane, in order to better streamline the boarding process and notify both the airline and customers about potential delays. They evaluated WSO2 CEP first as they were already using our platform and decided to use it as it addressed all their requirements.

Success Stories

Page 9: Discover Data That Matters- Deep dive into WSO2 Analytics

a

Winning the Data in Motion Hack Week with AWS and Geovation, providing an impressive solution, taking the data from many modes of transport and overlaying passenger flow/train loading and pollution data, and allowing users to plan a route based on how busy their stations/routes are, whilst also taking air quality into account.

DEBS (Distributed Event Based Systems) Chalance in Smart Home electricity data: 2000 sensors, 40 houses, 4 Billion events. We posted fastest single node solution measured (400K events/sec) and close to one million distributed throughput. WSO2 CEP based solution is one of the four finalists, and the only generic solution to become a finalist.

Build solution to search, visualize, analyze healthcare records (HL7) across 20 hospitals in Italy, with the combination of WSO2 ESB.

Foods supply company in USA, detects anomalies such as delivery delays and provides personalized notifications, and makes order recommendations based on history.

Success Stories ...

DEBS 2014

Page 10: Discover Data That Matters- Deep dive into WSO2 Analytics

Receivers + Pluggable

Custom Receivers

Page 11: Discover Data That Matters- Deep dive into WSO2 Analytics

{ 'name': TemperatureStream', 'version': '1.0.0', 'metaData':[ {'name':'sensorID','type':'STRING'}, ], 'correlationData':[], 'payloadData':[ {'name':'temperature','type':'DOUBLE'}, {'name':'preasure','type':'DOUBLE'} ]}

Event Streams Event

StreamID TemperatureStream:1.0

Timestamp 1487270220419

sensorID AP234

temperature 23.5

preasure 94.2

SourceIP 168.50.24.2

+ Support for arbitrary key-value pairs

Schema

Page 12: Discover Data That Matters- Deep dive into WSO2 Analytics

Realtime Analytics

Page 13: Discover Data That Matters- Deep dive into WSO2 Analytics

Realtime Analytics

It’s about :

• Gather data from multiple sources• Correlate data streams over time• Find interesting occurrences • And Notify • All in Realtime!

Page 14: Discover Data That Matters- Deep dive into WSO2 Analytics

Realtime Processing Pipeline

Page 15: Discover Data That Matters- Deep dive into WSO2 Analytics

Realtime Execution

• Process in streaming fashion (one event at a time)

• Execution logic written as Execution Plans • Execution Plan

– An isolated logical execution unit– Includes a set of queries, and relates to multiple input and output

event streams– Executed using dedicated WSO2 Siddhi engine

Page 16: Discover Data That Matters- Deep dive into WSO2 Analytics

Realtime Processing Patterns

• Transformation

– projection, transformation, enrich, split

• Temporal Aggregation

– basic stats, group by Aggregation, moving averages

• Alert and Threshold

• Event Correlation

• Trends

– detecting rise, fall, turn, triple bottom

• Partitioning

• Join Streams

• Query Data Store

Page 17: Discover Data That Matters- Deep dive into WSO2 Analytics

Siddhi Query Syntax

define stream <event stream>(<attribute> <type>,<attribute> <type>, ...);

from <event stream>select <attribute>,<attribute>, ...insert into <event stream> ;

Page 18: Discover Data That Matters- Deep dive into WSO2 Analytics

define stream SoftDrinkSales (region string, brand string, quantity int, price double);

from SoftDrinkSalesselect brand, quantityinsert into OutputStream ;

Output Streams are inferred

Siddhi Query ...

Page 19: Discover Data That Matters- Deep dive into WSO2 Analytics

from SoftDrinkSalesselect brand, avg(price*quantity) as avgCost,‘USD’ as currencyinsert into AvgCostStream

from AvgCostStreamselect brand, toEuro(avgCost) as avgCost,‘EURO’ as currencyinsert into OutputStream ;

Enriching Streams

Using Functions

Siddhi Query ...

Page 20: Discover Data That Matters- Deep dive into WSO2 Analytics

from SoftDrinkSales[region == ‘USA’ and quantity > 99]select brand, price, quantityinsert into WholeSales ;

from SoftDrinkSales#window.time(1 hour)select region, brand, avg(quantity) as avgQuantity

group by region, brandinsert into LastHourSales ;

Filtering

Aggregation over 1 hour

Other supported window types: timeBatch(), length(), lengthBatch(), etc.

Siddhi Query (Filter & Window)

Page 21: Discover Data That Matters- Deep dive into WSO2 Analytics

define stream Purchase (price double, cardNo long,place string);

from every (a1 = Purchase[price < 10] ) -> a2 = Purchase[ price >10000 and a1.cardNo == a2.cardNo ]

within 1 dayselect a1.cardNo as cardNo, a2.price as price, a2.place as placeinsert into PotentialFraud ;

Siddhi Query (Pattern) ...

Page 22: Discover Data That Matters- Deep dive into WSO2 Analytics

define stream StockStream (symbol string, price double, volume int);

partition by (symbol of StockStream)begin from t1=StockStream, t2=StockStream [(t2[last] is null and t1.price < price) or

(t2[last].price < price)]+within 5 min

select t1.price as initialPrice, t2[last].price as finalPrice,t1.symbol insert into IncreaingMyStockPriceStream end;

Siddhi Query (Trends & Partition)

Page 23: Discover Data That Matters- Deep dive into WSO2 Analytics

define table CardUserTable (name string, cardNum long) ;

@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’)define table CardUserTable (name string, cardNum long)

Cache types supported

• Basic: A size-based algorithm based on FIFO.• LRU (Least Recently Used): The least recently used

event is dropped when cache is full.• LFU (Least Frequently Used): The least frequently used event is dropped

when cache is full.

Siddhi Query (Table) ...

Supported for RDBMS, In-Memory, Analytics Table,

In-Memory Data Grid (Hazelcast )

Page 24: Discover Data That Matters- Deep dive into WSO2 Analytics

from Purchase#window.length(1) join CardUserTableon Purchase.cardNo == CardUserTable.cardNum

select Purchase.cardNo as cardNo, CardUserTable.name as name,Purchase.price as price

insert into PurchaseUserStream ;

from CardUserStreamselect name, cardNo as cardNumupdate CardUserTable

on CardUserTable.name == name ;

Similarly insert into and delete are also supported!

Siddhi Query (Table) ...

Join

Page 25: Discover Data That Matters- Deep dive into WSO2 Analytics

• Function extension• Aggregator extension• Window extension• Stream Processor extension

from SalesStreamselect brand, custom:toUSD(price, currency) as priceInUSDinsert into OutputStream ;

Referred with namespaces

Siddhi Query (Extension) ...

Page 26: Discover Data That Matters- Deep dive into WSO2 Analytics

• geo: Geographical processing • nlp: Natural language Processing (with Stanford NLP)• ml: Running machine learning models of WSO2 Machine Lerner • pmml: Running PMML models learnt by R• timeseries: Regression and time series • math: Mathematical operations• str: String operations • regex: Regular expression • ...

Siddhi Extensions

Page 27: Discover Data That Matters- Deep dive into WSO2 Analytics

Publishers

+ Pluggable Custom

Publishers

Page 28: Discover Data That Matters- Deep dive into WSO2 Analytics

Analytics Extension Store

• Receivers • Publishers • Siddhi Extension

https://store.wso2.com/

Page 29: Discover Data That Matters- Deep dive into WSO2 Analytics

Dashboards

• Dashboard generation• Gadget generation • Gather data via

– Websockets – Polling

• Custom/Personalised Gadget and Dashboard support

Page 30: Discover Data That Matters- Deep dive into WSO2 Analytics

Statistics and Tracing can be activated individually for

• Execution Plans• Event receivers• Event publishers

Statistics & Tracing

Page 31: Discover Data That Matters- Deep dive into WSO2 Analytics

Template support

Developers can create dynamic queries leveraging templates support

Page 32: Discover Data That Matters- Deep dive into WSO2 Analytics

Template support ...

Executive users can manage the system with a form based UI

Page 33: Discover Data That Matters- Deep dive into WSO2 Analytics

Predictive Analytics

Page 34: Discover Data That Matters- Deep dive into WSO2 Analytics

Predictive Analytics

• Guided UI to build machine learning models via – Apache Spark MLlib– H2O.ai (for deep learning

algorithms)– R and export them as PMML

• Run models using DAS and ESB• Run R Scripts, Regression and Anomaly Detection in Realtime

Page 35: Discover Data That Matters- Deep dive into WSO2 Analytics

Machine Learning Pipeline

Page 36: Discover Data That Matters- Deep dive into WSO2 Analytics

Prediction in Real-time

from DataStream#ml:predict(“/home/user/ml.model”, “double”)select *insert into PredictionStream ;

Page 37: Discover Data That Matters- Deep dive into WSO2 Analytics

Data Persistence

• Provides a backend data source agnostic way to storing and retrieving data

• Provides standard REST API• Pluggable data connectors

– RDBMS– Cassandra– HBase– custom ...

Data Abstraction Layer

Custom

Page 38: Discover Data That Matters- Deep dive into WSO2 Analytics

Data Persistence ...

• Analytics Tables– The data persistence entity in WSO2 Data Analytics Server– Provides a backend data source agnostic way of storing and

retrieving data– Allows applications to be written in a way, that it does not

depend on a specific data source, e.g. JDBC (RDBMS), Cassandra APIs etc..

– WSO2 DAS gives a standard REST API in accessing the Analytics Tables

Page 39: Discover Data That Matters- Deep dive into WSO2 Analytics

Data Persistence ...

• Analytics Record Stores– An Analytics Record Store, stores a specific set of Analytics

Tables– Event persistence can configure which Analytics Record

Store to be used for storing incoming events– Single Analytics Table namespace, the target record store

only given at the time of table creation– Useful in creating Analytics Tables where data will be stored

in multiple target databases

Page 40: Discover Data That Matters- Deep dive into WSO2 Analytics

Interactive Querying and Analytics

Page 41: Discover Data That Matters- Deep dive into WSO2 Analytics

Interactive Querying and Analytics ...

• Full text data indexing support powered by Apache Lucene

• Drilldown search support• Distributed data indexing

– Designed to support scalability• Near real time data indexing and

retrieval– Data indexed immediately as

received

Page 42: Discover Data That Matters- Deep dive into WSO2 Analytics

Interactive Querying Dashboard

Page 43: Discover Data That Matters- Deep dive into WSO2 Analytics

Activity Monitoring

• Correlate the messages collected based on the activity_id in the metadata of the event

• Trace the transaction path where the events could be in different tables using lucene queries

Page 44: Discover Data That Matters- Deep dive into WSO2 Analytics

Activity Explorer

Page 45: Discover Data That Matters- Deep dive into WSO2 Analytics

Batch Analytics

Page 46: Discover Data That Matters- Deep dive into WSO2 Analytics

• Powered by Apache Spark • Up to 30x higher performance than Hadoop• Parallel, distributed with optimized in-memory processing• Scalable script-based analytics written using an easy-to-learn, SQL-like query

language powered by Spark SQL• Interactive built in web interface for ad-hoc query execution• HA/FO supported scheduled query script execution • Run Spark on a single node, Spark embedded Carbon server cluster or connect

to external Spark cluster

Batch Analytics

Page 47: Discover Data That Matters- Deep dive into WSO2 Analytics

Interactive Console

Page 48: Discover Data That Matters- Deep dive into WSO2 Analytics

Scheduling Batch Jobs

Page 49: Discover Data That Matters- Deep dive into WSO2 Analytics

Incremental Processing

• Requirement– Data aggregations to be done efficiently as time series data is updated

continuously– Aggregation lookup operations to be done with any given time range

Page 50: Discover Data That Matters- Deep dive into WSO2 Analytics

Incremental Processing ...

1s 1s

1h

1m 1m 1m 1m

1h

1d

1s 1s 1s 1s 1s 1s CEP

Spark

Page 51: Discover Data That Matters- Deep dive into WSO2 Analytics

Incremental Processing ...

• Solution– Streaming data is first processed using the CEP engine for immediate

aggregation operations such as “avg”, “min”, “max”, “sum” etc… for smaller time intervals, such as 1s and 1m. And the resultant data records are persisted

– The persisted aggregation data for smaller time ranges (i.e. 1s, 1m) are looked up and further larger level aggregations are done. This step is done using batch analytics (Spark SQL). A custom extension is done here to allow incremental processing to Spark, where earlier processed data is not recomputed, rather the last checkpoint is remembered by the system

– O(log n) computation complexity

Page 52: Discover Data That Matters- Deep dive into WSO2 Analytics

● Idea is to given the “Overall idea” in a glance (e.g. car dashboard)

● Support for personalization, you can build your own dashboard.

● Also the entry point for Drill down● How to build?

○ Dashboard via Google Gadget and content via HTML5 + Javascript

○ Use WSO2 User Engagement Server to build a dashboard (or JSP/PHP)

○ Use charting libraries like Vega or D3

Communicate: Dashboards

Page 53: Discover Data That Matters- Deep dive into WSO2 Analytics

● Start with data in tabular format ● Map each column to dimension in your plot like X,Y, color,

point size, etc ● Also do drill-downs● Create a chart with few clicks

Gadget Generation Wizard

Page 54: Discover Data That Matters- Deep dive into WSO2 Analytics

Analytics for Products

Core:

•Analytics for Products distributions :

• ESB Analytics• IoTS Analytics• IS Analytics• etc...

Page 55: Discover Data That Matters- Deep dive into WSO2 Analytics

WSO2 Smart Analytics Solutions

• Banking and Finance• eCommerce and Digital Marketing• Fleet Management• Smart Energy Analytics• Social Media Analytics• System and Network Monitoring• QoS Enablement• Healthcare

Page 56: Discover Data That Matters- Deep dive into WSO2 Analytics

Minimum High Availability Deployment

All you need is 2 Nodes

Page 57: Discover Data That Matters- Deep dive into WSO2 Analytics

Deployment for Scalable Data Analytics

Scale based on your need !

Page 58: Discover Data That Matters- Deep dive into WSO2 Analytics

Key Differentiations

• Realtime analytics at its best – Rich set of realtime functions – Sequence and pattern detection

• No code compilations - SQL Like language• Incremental processing for everyday analytics• Intelligent decision making with ML and more• Rich sets of input & output connectors• High performance and low infrastructure cost

Page 59: Discover Data That Matters- Deep dive into WSO2 Analytics

Thank You!