how to leverage machine learning (r, hadoop, spark, h2o) for real time processing - kai waehner - ...
TRANSCRIPT
Kai WähnerTechnology Evangelist
@KaiWaehner
www.kai-waehner.de
jDays - Gothenburg, Sweden (March 2017)
Advanced Analytics and Machine Learning with R, Spark, H2O and TensorFlow for Real Time Processing
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time
Machine Learning
…. allows computers to find hidden insights without being explicitly programmed where to look.
Real World Examples of Machine Learning
Spam Detection Search Results +Product Recommendation
Picture Detection(Friends, Locations, Products)
Machine Learning is already present in daily life…
Now, every enterprise is beginning to leverage it!
The Next Disruption:Google Beats Go Champion
© Copyright 2000-2017 TIBCO Software Inc.
From Insight to Action - Closed Loop for Big Data Analytics
Insight ActionEVENTSEVENTS
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time
© Copyright 2000-2017 TIBCO Software Inc.
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Validation
6. Model Execution
7. Deployment
© Copyright 2000-2017 TIBCO Software Inc.
Variety of Data in Enterprises
CustomGUI-drivendataaccessviaSDK
SiebeleBusiness
Localdatasources
AccessExcel STDF
Drag-and-drop
MySQL
SQLServerOracle
InformationServices(join,transform,reusable,
parameterized,dynamicqueryforin-memoryuse)
Databases
JDBC/ODBC
HadoopSFDC
PostgreSQL
TeradataNetezza
Etc.XML
RDBMS
FlatFiles
Spread-sheets
WebServices
OracleE-Business
RDBMSRDBMS
RDBMS
SAP BWSAP R/3 DATA
FABRIC
Salesforce
ODBCOLEDBSqlClient
Directconnection
OracleTeradataAsterMSSSAS
Teradata
DirectQuery(dynamicallyqueryandretrievedatafor
visualizationandanalysis)
Databases
MySQLEtc.
OBIEE
NetezzaHadoop
© Copyright 2000-2017 TIBCO Software Inc.
Data Preparation
http://www.slideshare.net/odsc/feature-engineering
Data Preparation
© Copyright 2000-2017 TIBCO Software Inc.
Model Building
A model is a simplification of the truth that helps you with decision making.
© Copyright 2000-2017 TIBCO Software Inc.
Cross-Validation Procedure
https://genome.tugraz.at/proclassify/help/pages/XV.html
Execution within the Visual Analytics Tooling
© Copyright 2000-2017 TIBCO Software Inc.
Customer Churn with Random Forest Algorithm:
Select variables for the model
Advanced Analytics and Big Data Tools for Data Scientists
Many more ….
Portable Format for Analytics (PFA)
© Copyright 2000-2017 TIBCO Software Inc.
Demystify Data Science for the Business Analyst
Leverage Machine Learningwithout the help of a Data Scientist
Development of Analytic Models with R, TensorFlow, Apache Spark, RapidMiner, TIBCO Spotfire
Live DemoLive Demo
© Copyright 2000-2017 TIBCO Software Inc.
Agenda
1) Machine Learning and Big Data Analytics2) Building an Analytic Model3) Applying an Analytic Model in Real Time
© Copyright 2000-2017 TIBCO Software Inc.
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Validation
6. Model Execution
7. Deployment
© Copyright 2000-2017 TIBCO Software Inc.
Streaming Analytics - Processing Pipeline
APIs
Adapters / Channels
Integration
Messaging
Stream Ingest
Transformation
Aggregation
Enrichment
Filtering
StreamPreprocessing
Process Management
Analytics (Real Time)
Applications& APIs
Analytics / DW Reporting
StreamOutcomes
• Contextual Rules
• Windowing
• Patterns
• Analytics
• Deep ML
• …
Stream Analytics & Processing
Index / SearchNormalization
Applying an Analytic Modelis just a piece of the puzzle!
© Copyright 2000-2017 TIBCO Software Inc.
Frameworks and Products
(no complete list!)
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure MicrosoftStream Analytics
© Copyright 2000-2017 TIBCO Software Inc.
How to apply analytic models to real time processing without redevelopment?
StreamProcessingH20.ai
Open Source
R
TERR
Spark ML
MATLAB
SAS
PMML
Apache Spark ML and Spark Streaming with PMML Models
https://github.com/jpmml/jpmml-spark
© Copyright 2000-2017 TIBCO Software Inc.
Scenario: Predictive Scrapping of Parts in an Assembly Line
Station 1 Station 2
Cost Before9€ 7€ 13€ Total Cost
29€(or more)
Scrap? Scrap?
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
Fast Data Architecture for Predictive Maintenance
OperationalAnalytics
OperationsLiveUI
CSV Batch
JSON Real Time
XML Real Time
StreamingAnalyticsAction
Aggregate
Rules
Analytics
Correlate
LiveDatamart
Continuousqueryprocessing
Alerts
Manualaction,escalation
HISTORICALANALYSIS DataScientists
FlumeHDFS
Spotfire
R/TERRHDFS
Hadoop (Cloudera)
StreamBase
TIBCO Fast Data Platform
H2O
OracleRDBMS
Avro Parquet … PMML
InternalData
TIBCO Spotfire with H2O Integration
© Copyright 2000-2017 TIBCO Software Inc.
Example: Predictive Analytics for Manufacturing (“scrap parts as early as possible”)
© Copyright 2000-2017 TIBCO Software Inc.
From Insight to Action - Closed Loop for Big Data Analytics
Insight Action
MONITOR
PREDICT
ACT
DECIDE
MODEL
ACCESS
ANALYZE
WRANGLE
© Copyright 2000-2017 TIBCO Software Inc.
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Event Processing uses these Models (without Redevelopment) to take Action in Real Time
Questions? Please contact me!
Kai WähnerTechnology Evangelist
[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn