Download - Introduction to WSO2 Data Analytics Platform
![Page 1: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/1.jpg)
An Introduction to the WSO2 Analytics PlatformSrinath PereraVP Research WSO2, Apache Member(@srinath_perera) [email protected]
![Page 2: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/2.jpg)
![Page 3: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/3.jpg)
![Page 4: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/4.jpg)
Collect Data One Sensor API to
publish events - REST, Thrift, Java, JMS,
Kafka- Java clients, java script
clients* First you define streams
(think it as a infinite table in SQL DB)
Then publish events via Sensor API
![Page 5: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/5.jpg)
“Publish once, process anyway you like”
![Page 6: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/6.jpg)
Collecting Data: Example
Java example: create and send events Events send asynchronously See client given in http://goo.gl/vIJzqc for more info
Agent agent = new Agent(agentConfiguration);publisher = new AsyncDataPublisher("tcp://hostname:7612", .. );
StreamDefinition definition = new StreamDefinition(STREAM_NAME,VERSION);definition.addPayloadData("sid", STRING);... publisher.addStreamDefinition(definition);... Event event = new Event();event.setPayloadData(eventData);publisher.publish(STREAM_NAME, VERSION, event);Send events
Define Stream
Initialize Stream
![Page 7: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/7.jpg)
Data Collection Examples• Collect data from inbuilt agents in
WSO2 products, Tomcat etc.• Collecting your log data via log stash • Collecting JVM and JMX stats via
agent • Ingesting data from message queues
such as JMS or Kafka • Pulling data from a RSS feed, or
scraping a web page • Write a custom agent to collect data
from your system and push it to DAS
Photo credit http://www.torange.us/ CC license
![Page 8: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/8.jpg)
Analysis: Batch Analytics• Batch analytics reads data from a disk ( or some other
storage) and process them record by record • “MapReduce” is most widely used technology for batch
analytics – Apache Hadoop– Apache Spark 30X faster and much more flexible
• Analytics (Min, Max, average, correlation, histograms, might join or group data in many ways)
• Key Performance indicators (KPIs)– E.g. Profit per square feet for retail
• Presented as a Dashboard
![Page 9: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/9.jpg)
SQL like Queries: Spark SQL Since many understands SQL,
Hive made large scale data processing Big Data accessible to many
Expressive, short, and sweet. Define core operations that
covers 90% of problems Lets experts dig in when they
like! (via User Defined functions)insert overwrite table BusSpeed select hour, average(v) as avgV, busID from BusStream group by busID, getHour(ts);
![Page 10: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/10.jpg)
Spark SQL Query
Count entries where username is not empty group by user name and ordered by the count
SELECT username, COUNT(*) AS count FROM wikiData WHERE username <> '' GROUP BY username ORDER BY count DESC LIMIT 10
![Page 11: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/11.jpg)
Usecase: API Usage
• Looking at different API calls by countries• Designed to draw attention to what APIs are
used and where
![Page 12: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/12.jpg)
Value of some Insights degrade Fast!
For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrades very quickly with time.
We need technology that can produce outputs fast Static Queries, but need very fast
output (Alerts, Realtime control) Dynamic and Interactive Queries
( Data exploration)
![Page 13: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/13.jpg)
Realtime Analytics: Complex Event Processing
![Page 14: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/14.jpg)
CEP Queries 1
Calculate average temperature over a 1 minute sliding window group by roomNo
Define Stream TempStream(roomNo string, temp double )from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTempgroup by roomNoinsert all events into AvgRoomTempStream ;
![Page 15: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/15.jpg)
CEP Queries 2
Using data from a Football game Kick stream shows kicks by players on the ball Ball possession is hit by me, followed by any number
of hits by me, followed by hit by someone else
from every k1 =KickStream, KickStream[playerid = k1.playerid]*,KickStream[playerid != k1.playerid]
select ..insert into BallPosessionStream;
![Page 16: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/16.jpg)
People Tracking via BLE
• Track people through BLE via triangulation
• Higher level logic via Complex Event Processing
• Traffic Monitoring • Smart retail • Airport management
![Page 17: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/17.jpg)
Realtime Soccer Analysis
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
![Page 18: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/18.jpg)
Scaling CEP Queries on top of Storm
▪Accepts CEP queries with hints about how to partition streams
▪Partition streams, build a Apache Storm topology running CEP nodes as Storm Sprouts, and run it. see http://goo.gl/pP3kdX for more info.
![Page 19: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/19.jpg)
CEP Queries On Strom
@dist(parallel='4’) ask to run it with 4 nodes Use partition definition to break the data so they
can run in parallel
define partition on TempStream.region {@dist(parallel='4’) from TempStream[temp > 33]insert into HighTempStream;
}
from HighTempStream#window(1h)select max(temp)as max insert into HourlyMaxTempStream;
![Page 20: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/20.jpg)
Interactive Analytics Best way to explore
data is by asking Ad-hoc questions
Interactive Analytics ( Search) let you query the system and receive fast results (<10s)
Shows data in context (e.g. by grouping events from the same transaction together)
Built using Lucence based Indexes.
SparkSQL> SELECT * FROM TWITTER_DATA
![Page 21: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/21.jpg)
Predictive Analytics Can you “Write a program to drive a
Car?” Machine learning
Takes in lot of examples, and build a program that matches those examples
We call that program a “model” Lot of tools
- R ( Statistical language)- Sci-kit learn (Python)- Apache Spark’s MLBase and Apache
Mahout (Java)
![Page 22: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/22.jpg)
Predictive Analytics in DAS• Building models
– With WSO2 Machine Learner Product via a Wizard ( powered by MLLib)
– Build model using R and export them as PMML
• Built models can be used them with both WSO2 CEP and ESB
![Page 23: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/23.jpg)
Using the Model Within CEPfrom InputStream#ml:predict(’/../diabetes-model', 'double')select *insert into PredictionStream;
<predict> <model storage-location=”../downloaded-ml-model"/> <features> <feature name="SI2" expression="$body/features/SI2"/> .. </features> <predictionOutput property="result"/></predict>
Within ESB
![Page 24: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/24.jpg)
WSO2 Machine Learner• Upload or select data • Explore the data • Train a Machine
learning model
![Page 25: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/25.jpg)
WSO2 Machine Learner• Compare Results• Understand why• Iterate
![Page 26: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/26.jpg)
Supported Algorithms• Deep Learning based classification (H2O’s Stacked
Autoencoders Classifier).• Classification algorithms - Decision Trees, Linear
Regression, Lasso Regression, SVM, Naïve • K-Mean clustering for unsupervised learning on your
data• Employ Anomaly Detection using K Means
Algorithm to identify fraud, network penetration and other difficult scenarios
• Recommendations Engine (Collaborative Filtering Algorithm)
![Page 27: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/27.jpg)
Predict wait time in the Airport
• Predicting the time to go through airport
• Real-time updates and events to passengers
• Let airport manage by allocate resources
![Page 28: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/28.jpg)
Predict Promising Customers• Typical website can get millions of users • Only very small fraction coverts • Each user, we know what he access, where
is works, country, what browser, OS, etc. • Problem is to predict what users will covert • Used Logistic regression, Random Forest,
Survival Modeling etc.
![Page 29: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/29.jpg)
Predict Super Bowl• Predicted 7 of the 11
games • Done with Random
Forest Algorithm • Even what we missed
are instructive
See Yuda’s post: Predicting the Super Bowl with Machine Learning
![Page 30: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/30.jpg)
Anomaly Detection:Markov Models
• Can model probability of a sequences
• Given a sequence, can predict likelihood, and use that to detect anomalies.
• Implemented with WSO2 CEP
![Page 31: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/31.jpg)
Anomaly Detection: Clustering• Use clustering to
identify normal behavior as clusters
• Consider points away from all cluster as anomalies.
• Point is considered away from a cluster if it is outside 99% percentile line for that cluster
• Includes in WSO2 ML
![Page 32: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/32.jpg)
Communicate: Dashboards• Dashboard give an
“Overall idea” in a glance (e.g. car dashboard)– Boring when everything is
good!!• Build your own dashboard.
– WSO2 DAS supports a
gadget generation Wizard– You can write your own
Gadgets using D3 and Javascript.
![Page 33: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/33.jpg)
Gadget Generation Wizard
• Starts with data in tabular format • Map each column to dimension in
your plot like X,Y, color, point size, etc
• Create a chart with few clicks
Powered by VizGrammer lib that uses
Vaga undneath
(see https://github.com/wso2/VizGrammar)
![Page 34: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/34.jpg)
Communicate: Alerts▪Done with CEP Queries▪Last Mile- Email, SMS- Push notifications to a UI- Pager - Trigger physical Alarm
![Page 35: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/35.jpg)
Real Life Use Cases▪Cisco ( OEM the platform with Cisco solutions, Health, Smart Parking)
▪Experian ( Digital Marketing) - see video
▪Pacific Controls ( Smart City Platform, Vehicle tracking, building monitoring) - see video
▪Throttling and Anomaly Detection ( by group of Telco companies)
▪API Analytics (13+ customers)
No battle plan survives contact with
the enemy--Helmuth von Moltke
![Page 36: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/36.jpg)
Key Differentiators• Open Source, under Apache 2 license• Publish data once, analyze it anyway
you like experience. • Flexible packaging or as a scalable
cluster • Rich, extensible, SQL-like configuration
language• Compact, easy to learn syntax
addressing complex requirements, such as time windows, patterns, sequences which would be complex to develop in a programming language such as Java.
• Rich set of data connectors, which can be easily extended
•Events only need to be published once from applications to the platform, and can be consumed by batch or real time pipeline.
• Performance on single node satisfies 90% of use cases
• Part of the overall WSO2 platform
36
![Page 37: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/37.jpg)
More Information▪Introducing WSO2 Analytics Platform: Note for Architects, https://iwringer.wordpress.com/2015/03/18/introducing-wso2-analytics-platform-note-for-architects/
▪WSO2 Data Analytics Server, http://wso2.com/products/data-analytics-server/
▪WSO2 Complex Event Processor, http://wso2.com/products/complex-event-processor/
▪WSO2 Machine Learner, http://wso2.com/products/machine-learner/
![Page 38: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/38.jpg)
![Page 39: Introduction to WSO2 Data Analytics Platform](https://reader038.vdocuments.us/reader038/viewer/2022103010/587137c61a28abf0568b61a1/html5/thumbnails/39.jpg)
Thank You