how to leverage machine learning (r, hadoop, spark, h2o) for real time processing - kai waehner - ...

36
Kai Wähner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.kai-waehner.de jDays - Gothenburg, Sweden (March 2017) Advanced Analytics and Machine Learning with R, Spark, H2O and TensorFlow for Real Time Processing

Upload: codemotion

Post on 05-Apr-2017

25 views

Category:

Technology


1 download

TRANSCRIPT

Kai WähnerTechnology Evangelist

[email protected]

LinkedIn

@KaiWaehner

www.kai-waehner.de

jDays - Gothenburg, Sweden (March 2017)

Advanced Analytics and Machine Learning with R, Spark, H2O and TensorFlow for Real Time Processing

© Copyright 2000-2017 TIBCO Software Inc.

Apply Big Data Analytics to Real Time Processing

© Copyright 2000-2017 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time

© Copyright 2000-2017 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time

Machine Learning

…. allows computers to find hidden insights without being explicitly programmed where to look.

Real World Examples of Machine Learning

Spam Detection Search Results +Product Recommendation

Picture Detection(Friends, Locations, Products)

Machine Learning is already present in daily life…

Now, every enterprise is beginning to leverage it!

The Next Disruption:Google Beats Go Champion

© Copyright 2000-2017 TIBCO Software Inc.

From Insight to Action - Closed Loop for Big Data Analytics

Insight ActionEVENTSEVENTS

© Copyright 2000-2017 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time

© Copyright 2000-2017 TIBCO Software Inc.

Analytical Pipeline

1. Data Access

2. Data Preparation

3. Exploratory Data Analysis

4. Model Building

5. Model Validation

6. Model Execution

7. Deployment

© Copyright 2000-2017 TIBCO Software Inc.

Variety of Data in Enterprises

CustomGUI-drivendataaccessviaSDK

SiebeleBusiness

Localdatasources

AccessExcel STDF

Drag-and-drop

MySQL

SQLServerOracle

InformationServices(join,transform,reusable,

parameterized,dynamicqueryforin-memoryuse)

Databases

JDBC/ODBC

HadoopSFDC

PostgreSQL

TeradataNetezza

Etc.XML

RDBMS

FlatFiles

Spread-sheets

WebServices

OracleE-Business

RDBMSRDBMS

RDBMS

SAP BWSAP R/3 DATA

FABRIC

Salesforce

ODBCOLEDBSqlClient

Directconnection

OracleTeradataAsterMSSSAS

Teradata

DirectQuery(dynamicallyqueryandretrievedatafor

visualizationandanalysis)

Databases

MySQLEtc.

OBIEE

NetezzaHadoop

© Copyright 2000-2017 TIBCO Software Inc.

Data Preparation

http://www.slideshare.net/odsc/feature-engineering

Data Preparation

Visual Analytics - Interactive Brush-Linked

© Copyright 2000-2017 TIBCO Software Inc.

© Copyright 2000-2017 TIBCO Software Inc.

Model Building

A model is a simplification of the truth that helps you with decision making.

© Copyright 2000-2017 TIBCO Software Inc.

Cross-Validation Procedure

https://genome.tugraz.at/proclassify/help/pages/XV.html

© Copyright 2000-2017 TIBCO Software Inc.

Execution via Code / Scripting

Execution within the Visual Analytics Tooling

© Copyright 2000-2017 TIBCO Software Inc.

Customer Churn with Random Forest Algorithm:

Select variables for the model

© Copyright 2000-2017 TIBCO Software Inc.

Frameworks and Tooling

Advanced Analytics and Big Data Tools for Data Scientists

Many more ….

Portable Format for Analytics (PFA)

© Copyright 2000-2017 TIBCO Software Inc.

Demystify Data Science for the Business Analyst

Leverage Machine Learningwithout the help of a Data Scientist

Development of Analytic Models with R, TensorFlow, Apache Spark, RapidMiner, TIBCO Spotfire

Live DemoLive Demo

© Copyright 2000-2017 TIBCO Software Inc.

Agenda

1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time

© Copyright 2000-2017 TIBCO Software Inc.

Analytical Pipeline

1. Data Access

2. Data Preparation

3. Exploratory Data Analysis

4. Model Building

5. Model Validation

6. Model Execution

7. Deployment

© Copyright 2000-2017 TIBCO Software Inc.

Streaming Analytics - Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Analytics

• Deep ML

• …

Stream Analytics & Processing

Index / SearchNormalization

Applying an Analytic Modelis just a piece of the puzzle!

© Copyright 2000-2017 TIBCO Software Inc.

Frameworks and Products

(no complete list!)

OPEN SOURCE CLOSED SOURCE

PRODUCT

FRAMEWORK

Azure MicrosoftStream Analytics

© Copyright 2000-2017 TIBCO Software Inc.

How to apply analytic models to real time processing without redevelopment?

StreamProcessingH20.ai

Open Source

R

TERR

Spark ML

MATLAB

SAS

PMML

Apache Spark ML and Spark Streaming with PMML Models

https://github.com/jpmml/jpmml-spark

© Copyright 2000-2017 TIBCO Software Inc.

© Copyright 2000-2017 TIBCO Software Inc.

TIBCO StreamBase Connector for R and TERR

© Copyright 2000-2017 TIBCO Software Inc.

TIBCO StreamBase Connector for H2O.ai

© Copyright 2000-2017 TIBCO Software Inc.

TIBCO StreamBase Connector for PMML

Scenario: Predictive Scrapping of Parts in an Assembly Line

Station 1 Station 2

Cost Before9€ 7€ 13€ Total Cost

29€(or more)

Scrap? Scrap?

Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

Fast Data Architecture for Predictive Maintenance

OperationalAnalytics

OperationsLiveUI

CSV Batch

JSON Real Time

XML Real Time

StreamingAnalyticsAction

Aggregate

Rules

Analytics

Correlate

LiveDatamart

Continuousqueryprocessing

Alerts

Manualaction,escalation

HISTORICALANALYSIS DataScientists

FlumeHDFS

Spotfire

R/TERRHDFS

Hadoop (Cloudera)

StreamBase

TIBCO Fast Data Platform

H2O

OracleRDBMS

Avro Parquet … PMML

InternalData

TIBCO Spotfire with H2O Integration

© Copyright 2000-2017 TIBCO Software Inc.

Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)

TIBCO StreamBase / Live Datamart + H2O.ai

Live DemoLive Demo

© Copyright 2000-2017 TIBCO Software Inc.

From Insight to Action - Closed Loop for Big Data Analytics

Insight Action

MONITOR

PREDICT

ACT

DECIDE

MODEL

ACCESS

ANALYZE

WRANGLE

© Copyright 2000-2017 TIBCO Software Inc.

Key Take-Aways

Ø Insights are hidden in Historical Data on Big Data Platforms

Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models

Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time

Questions? Please contact me!

Kai WähnerTechnology Evangelist

[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn