how to apply big data analytics and machine learning to real time processing - kai waehner -...

57
How to apply machine learning to real-time processing Kai Waehner MILAN 25-26 NOVEMBER 2016 Technology Evangelist [email protected] LinkedIn @KaiWaehner www.kai-waehner.de

Upload: codemotion

Post on 07-Jan-2017

106 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Howtoapplymachinelearningtoreal-timeprocessing

Kai Waehner

MILAN 25-26 NOVEMBER 2016

[email protected]@KaiWaehnerwww.kai-waehner.de

Page 2: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Apply Big Data Analytics to Real Time Processing

Page 3: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analyze and Act on Critical Business Moments

Page 4: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics

2) Building an Analytic Model

3) Real Time Processing

4) Live Demo

Page 5: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics

2) Building an Analytic Model

3) Real Time Processing

4) Live Demo

Page 6: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Machine Learning

…. allows computers to find hidden insights without being explicitly programmed where to look.

Page 7: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Real World Examples of Machine Learning

Spam Detection Search Results +Product Recommendation

Picture Detection(Friends, Locations, Products)

Machine Learning is already present in daily life…

Now, every enterprise is beginning to leverage it!

The Next Disruption:Google Beats Go Champion

Page 8: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytics Maturity Model

Immediate Long-TermCompetitiveAdvantageValue to the Organization

A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases

Self-serviceDashboards

EventProcessingAdvancedAnalytics

Measure Diagnose Predict Optimize Alert Automate

Analytics Maturity

VisualAnalytics

EventProcessing

Analytics

Page 9: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytics Maturity Model

Immediate Long-TermCompetitiveAdvantageValue to the Organization

VisualAnalytics

EventProcessingAdvancedAnalytics

Measure Diagnose Predict Optimize Alert Automate

Analytics Maturity

A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases

Analytics

Page 10: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytics Maturity Model

Immediate Long-TermCompetitiveAdvantageValue to the Organization

Self-serviceDashboards

EventProcessingAdvancedAnalytics

Measure Diagnose Predict Optimize Alert Automate

Analytics Maturity

A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases

VisualAnalytics

EventProcessing

Analytics

Page 11: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics

2) Building an Analytic Model

3) Real Time Processing

4) Live Demo

Page 12: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytical Pipeline

Page 13: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytics Maturity Model

Immediate Long-TermCompetitiveAdvantageValue to the Organization

A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases

Self-serviceDashboards

EventProcessingAdvancedAnalytics

Measure Diagnose Predict Optimize Alert Automate

Analytics Maturity

VisualAnalytics

EventProcessing

Analytics

Page 14: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytical Pipeline

Page 15: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Variety of Data in Enterprises

CustomGUI-drivendataaccessvia

SDK

SiebeleBusiness

Localdatasources

AccessExcel STDF

Drag-and-drop

MySQL

SQLServerOracle

InformationServices(join,transform,reusable,

parameterized,dynamicqueryforin-memoryuse)

Databases

JDBC/ODBC

HadoopSFDC

PostgreSQL

TeradataNetezza

Etc.XML

RDBMS

FlatFiles

Spread-sheets

WebServices

OracleE-Business

RDBMSRDBMS

RDBMS

SAP BWSAP R/3 DATA

FABRIC

Salesforce

ODBCOLEDBSqlClient

Directconnection

Oracle

TeradataAsterMSSSASTeradata

DirectQuery(dynamicallyqueryandretrievedata

forvisualizationandanalysis)

Databases

MySQLEtc.

OBIEE

NetezzaHadoop

Page 16: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Data Acquisition

“Smart Recommendation Engine”

Page 17: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytical Pipeline

Page 18: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

cust_id dept sku dollar gift date1 104 C 12003 2.40 FALSE 2016-10-172 105 A 12005 62.85 FALSE 2016-10-173 102 C 12007 69.23 TRUE 2016-10-174 104 B 12004 9.33 FALSE 2016-10-185 105 C 12010 14.16 TRUE 2016-10-186 101 B 12003 90.43 FALSE 2016-10-197 103 C 12005 90.97 FALSE 2016-10-19n … … … … … …

cust_id A B C total # orders first_date

last_date

1 100 21.76 23.67 0.00 45.43 2 2016-10-19

2016-10-20

2 101 0.01 74.65 0.00 74.66 3 2016-10-19

2016-10-20

3 102 0.00 60.92 50.29 111.21 6 2016-10-17

2016-10-20

4 103 0.00 0.00 52.30 52.30 2 2016-10-19

2016-10-20

5 104 31.34 9.33 2.40 43.06 4 2016-10-17

2016-10-20

6 105 62.85 0.00 56.00 118.85 3 2016-10-17

2016-10-20

© Copyright 2000-2016 TIBCO Software Inc.

Data Munging - Transformations

Page 19: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytical Pipeline

Page 20: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

“The greatest value of a picture is when it forces us to notice what we never expected to see”

John W. Tukey, 1977

© Copyright 2000-2016 TIBCO Software Inc.

Exploratory Data Analysis

Page 21: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Visual Analytics - Interactive Brush-Linked

© Copyright 2000-2016 TIBCO Software Inc.

… and “Inline Data Wrangling” à Ad-hoc data preparation instead of just ETL

Page 22: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytics Maturity Model

Immediate Long-TermCompetitiveAdvantageValue to the Organization

VisualAnalytics

EventProcessingAdvancedAnalytics

Measure Diagnose Predict Optimize Alert Automate

Analytics Maturity

A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases

Analytics

Page 23: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytical Pipeline

Page 24: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Which picture represents a model?

A model is a simplification of the truth that helps you with decision making.

Page 25: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Model Building

Page 26: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Model Building

Page 27: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Employees who write longer emails earn higher salaries!

© Copyright 2000-2016 TIBCO Software Inc.

Model Building

Page 28: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Model Improvement

Page 29: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Managers

Staff

© Copyright 2000-2016 TIBCO Software Inc.

Model Improvement

Page 30: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytical Pipeline

Page 31: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Model Validation

How is the IQ of a kid related to the IQ of his / her mum?

Page 32: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Frameworks and Tooling

Page 33: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Advanced Analytics and Big Data Tools (for Data Scientists)

Many more ….

Page 34: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

“…as a next-generation data discovery capability that automatically finds and explains insights from advanced analytics to business users or citizen data scientists”

Smart Data Discovery (for the Business User)

Leverage Machine Learningwithout the help of a Data Scientist

Page 35: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Smart Visual Analytics vs. Data Science Tools

Live DemoLive Demo

Page 36: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics

2) Building an Analytic Model

3) Real Time Processing

4) Live Demo

Page 37: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Analytics Maturity Model

Immediate Long-TermCompetitiveAdvantageValue to the Organization

Self-serviceDashboards

EventProcessingAdvancedAnalytics

Measure Diagnose Predict Optimize Alert Automate

Analytics Maturity

A good Big Data Analytics platform can provide value to the organization across the full spectrum of use cases

VisualAnalytics

EventProcessing

Analytics

Page 38: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Traditional Data Processing: ”Request – Response”

Store

Analyze

Act

Page 39: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

The New Era: Streaming Analytics

Act & Monitor

Analyze

Store

Page 40: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Streaming Analytics - Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW

Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Analytics

• Deep ML

• …

Stream Analytics & Processing

Index / SearchNormalization

Applying an Analytic Modelis just a piece of the puzzle!

Page 41: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Frameworks and Products

(no complete list!)

OPEN SOURCE CLOSED SOURCE

PRODUCT

FRAMEWORK

Azure MicrosoftStream Analytics

Page 42: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Comparison of Stream Processing Frameworks and Products

Slide Deck and Video Recording:http://www.kai-waehner.de/blog/2016/11/15/streaming-analytics-comparison-open-source-frameworks-products-cloud-services/

Page 43: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Apache Storm – Hello World

http://wpcertification.blogspot.ch/2014/02/helloworld-apache-storm-word-counter.html

Page 44: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Visual Coding for Streaming Analytics

• StreamingOperators• Connectivity• VisualDevelopment• Testing&Simulation• MatureTooling/Support• MiddlewareIntegration

Page 45: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Live Visual Analytics UI

Dynamicaggregation

Livevisualization

Ad-hoccontinuousquery

Alerts

Action

Page 46: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

How to apply analytic models to real time processing without redevelopment?

StreamProcessi

ngH20.ai

Open Source R

TERRSpark

ML MATLAB

SAS

PMML

Page 47: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

TIBCO StreamBase Connector for H2O.ai

Page 48: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics

2) Building an Analytic Model

3) Real Time Processing

4) Live Demo

Page 49: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Scenario: Predictive Scrapping of Parts in an Assembly Line

Goal: Scrap parts as early as possible automatically to reduce costs in a manufacturing process.

Question: When to scrap a part in Station 1 instead of doing re-work or sending it to Station 2?

Station 1 Station 2

Cost Before9€ 7€ 13€ Total Cost

29€(or more)

Scrap? Scrap?

Page 50: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Fast Data Architecture for Predictive Maintenance

OperationalAnalytics

OperationsLiveUI

CSV Batch

JSON Real Time

XML Real Time

StreamingAnalyticsAction

Aggregate

Rules

Analytics

Correlate

LiveDatamart

Continuousqueryprocessing

Alerts

Manualaction,escalation

HISTORICALANALYSIS DataScientists

FlumeHDFS

Spotfire

R/TERRHDFS

Hadoop (Cloudera)

StreamBase

TIBCO Fast Data Platform

H2O

OracleRDBMS

Avro Parquet … PMML

InternalData

Page 51: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

TIBCO Spotfire with H2O Integration

Data Discovery / Data Mining (“Are parts that repeat a station more likely scrap parts?”)

Page 52: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

TIBCO Live Datamart

Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)

Live Dartmart Desktop Client

Page 53: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

TIBCO Live Datamart

Operational Intelligence (“Monitor the manufacturing process and change rules in real time!”)

Live Dartmart Web API

Page 54: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

TIBCO Spotfire + StreamBase + H2O.ai + Live Datamart

Live DemoLive Demo

Page 55: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

TIBCO Accelerator for Apache Spark

1. Fast Data Preparation for IoTDozens of enterprise and IoT data preparation adapters: MQTT, Databases; inbound creation of HDFS, Parquet, Hbase, Avro…

2. Spotfire Model Discovery TemplateUse Spotfire to explore Spark data lake, create predictive model, train in H20, and deploy to Streaming Analytics.

3. Operationalize Predictive ModelsZookeeper deployment to StreamBase nodes living in Spark cluster via H20, PMML, TERR models

4. Streaming Analytics for AutomationAutomate action based on predictive models – make offers to customers, stop fraudulent transactions, alert.

5. Monitor & Retrain Model Monitor behavior of model, retrain when necessary.

6. Drag & Drop for Business Solution DevelopersCode-free development environment for work with H20, HDFS, Avro, TERR

The TIBCO Accelerator for Spark is a TIBCO engineered, light-weight open-source fast-start for systems to stream data into Spark, discover patterns in Spark with Spotfire, and operationalize the insights on Big Data.

FUNCTIONAL COMPONENTS

Page 56: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

© Copyright 2000-2016 TIBCO Software Inc.

Key Take-Aways

Ø Insights are hidden in Historical Data on Big Data Platforms

Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models

Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time

Page 57: How to Apply Big Data Analytics and Machine Learning to Real Time Processing - Kai Waehner - Codemotion Milan 2016

Questions? Please contact me!

Kai WähnerTechnology Evangelist at TIBCO

[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn