Senior Technical LeadAnjana Fernando
Batch and Interactive Analytics: From Data
to Insight
2
Agenda
2
๏ Batch and Interactive Processing Defined๏ Technologies used for Batch/Interactive Analytics๏ WSO2 Analytics Architecture ๏ Solutions๏ Demo
3
Let’s Break It Down...
3
๏ Batch Analytics:
Batch Analytics is where the data is first stored, and later read back to do some relatively time consuming data processing task.
๏ Interactive Analytics:
Interactive analytics is used where, a stored data set can be queried in an ad-hoc manner in finding useful information quickly.
Source: http://themarketingblog.ecornell.com/
4
Where Can We Use It?
4
๏ Service Statistics Generation๏ Extracting KPIs: average response
time, maximum latency etc..๏ Log Analysis
๏ Efficiently store and analyse logs, in supporting comprehensive search operations
๏ Activity Monitoring๏ Trace a workflow of events
throughout a system. Useful in finding failed transactions, performance issues etc..
๏ Solving Optimization Problems๏ Analysing large amount of past
data in optimizing parameters for an existing algorithm
Source: http://www.axentas.com/
55
Batch Analytics Technologies
66
Interactive Analytics (Indexing) Technologies
๏ Solr / SolrCloud
๏ ElasticSearch
๏ WSO2 DAS
7
WSO2 Analytics Platform
7
8
WSO2 Analytics Platform
8
9
WSO2 DAS Architecture
9
10
Data Model
10
Data Published according to a strongly typed data stream
{
'name': 'stream.name',
'version': '1.0.0',
'nickName': 'stream nickname',
'description': 'description of the stream',
'metaData':[
{'name':'meta_data_1','type':'STRING'},
],
'correlationData':[
{'name':'correlation_data_1','type':'STRING'}
],
'payloadData':[
{'name':'payload_data_1','type':'BOOL'},
{'name':'payload_data_2','type':'LONG'}
]
}
11
WSO2 DAS - Batch Processing
11
๏ Powered by Apache Spark 10 - 100x higher performance than Hadoop๏ Parallel, distributed with optimized in-memory processing๏ Can run on top of Hadoop Yarn, Mesos or in Standalone mode๏ Scalable script-based analytics written using an easy-to-learn, SQL-like query
language powered by Spark SQL๏ Interactive built in web interface (Spark Console) for ad-hoc query execution๏ HA/FO supported scheduled query script execution ๏ Run Spark on a single node, Spark embedded Carbon server cluster or connect to
external Spark cluster๏ Custom UDF support
INSERT INTO TABLE UserTable SELECT userName, COUNT(DISTINCT orderID), SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0" GROUP BY userName;
e.g.:-
12
Spark vs Hadoop MapReduce
12
๏ Hadoop MapReduce๏ Supports only Map/Reduce, fine
for single pass computations๏ High processing latency and
inefficiencies related to intermediate results persisted
๏ Hard to implement iterative algorithms
๏ Spark๏ Resilient Distributed Dataset (RDD)
based๏ Support more than just Map and
Reduce functions๏ Intermediate results kept in-
memory๏ Lazy evaluation of data operations,
allowing more optimization๏ Allows developer to implement
complex data operations in a DAG pattern
๏ In-Memory/Persisted mode operation, switch when required
๏ Simpler API
13
WSO2 DAS - Interactive Analytics Features
13
๏ Full text data indexing support powered by Apache Lucene๏ Drill-down search support๏ Distributed data indexing
๏ Designed to support scalability๏ Near real-time data indexing and retrieval
๏ Data indexed immediately as received๏ Distributed indexing implementation for scalability
๏ Index sharding with Lucene indices ๏ Data storage scalability achieved with underlying database, e.g. HBase,
Cassandra, RDBMS etc..
log: “ERROR” AND (ip: “192.168.4.33” OR ip: “192.168.4.34”) AND type: “HTTPD”
e.g.:-
14
WSO2 DAS - Mixing Real-time / Batch Processing
14
15
WSO2 DAS - Alerts
15
๏ Detecting conditions can be done via CEP Queries
๏ Key is the “Last Mile”๏ Email๏ SMS๏ Push notifications to a UI๏ Pager ๏ Trigger physical Alarm
๏ How?๏ Batch Analytics: Using WSO2’s custom
Analytics Provider for Spark SQL to directly send records as events to an event stream
๏ Select Email sender “Output Adaptor” from DAS, or send from DAS to ESB -> ESB Connectors
16
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
16
17
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
17
18
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
18
19
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
19
20
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
20
21
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
21
22
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
22
23
Solutions Supported with Batch/Interactive Analytics:Service Statistics Monitoring
23
● Activity monitoring is for tracking events from multiple nodes in a flow to understand a
specific activity
○ e.g.:-
■ A client initiating a web services request which travels through multiple
ESBs, application servers and returns back. This flow will be uniquely
identified and visualized in DAS
○ Used for tracing messages, finding performance hotspots in the flow
○ Implemented based on a correlation id based mechanism and indexing
○ Upcoming: Mediator level tracing and profiling in WSO2 ESB 5.0
Solutions Supported with Batch/Interactive Analytics:Activity Monitoring
Solutions Supported with Batch/Interactive Analytics:Activity Monitoring
Solutions Supported with Batch/Interactive Analytics:Activity Monitoring
Solutions Supported with Batch/Interactive Analytics:Activity Monitoring
Solutions Supported with Batch/Interactive Analytics:Activity Monitoring
Solutions Supported with Batch/Interactive Analytics:Activity Monitoring
Solutions Supported with Batch/Interactive Analytics:Log Analysis
● Log analysis toolbox
● Log event indexing
○ Uses the new DAS v3.x indexing support
○ Event attributes can be indexed to be search by server, cluster, log type and also log
messages itself for full text search
● Custom search queries using Lucene queries and regular expressions
● Logstash adaptor for log publishing
Solutions Supported with Batch/Interactive Analytics:Log Analysis
Solutions Supported with Batch/Interactive Analytics:Log Analysis
Demo
Contact us !