quickml: machine learning for developers...2020/02/11  · machine learning agility the problem for...

48
QuickML: Machine Learning for Developers Benjamin De Boe BENELUX SYMPOSIUM 2020

Upload: others

Post on 20-May-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

QuickML: Machine Learning for Developers

Benjamin De BoeBENELUX SYMPOSIUM 2020

Page 2: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

IntegratedML:Machine Learning for Developers

Benjamin De BoeBENELUX SYMPOSIUM 2020

Page 3: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Outline

Integrated intro to ML

Intro to IntegratedML

The Belgian Connection

Analytics with InterSystems IRIS

Wrapping up

IntegratedML:Machine Learning for Developers

Page 4: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

IntegratedML: Machine Learning for Developers

Page 5: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

IntegratedML:Machine Learning for Developers

Page 6: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Integrated intro to ML

Introducing IntegratedML

Page 7: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python
Page 8: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python
Page 9: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

9 | © InterSystems Corporation. All rights reserved. |

Traditional Programming Machine Learning

Quick intro to ML

ComputerData

ProgramOutput Computer

Data

OutputProgram

Page 10: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python
Page 11: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

11 | © InterSystems Corporation. All rights reserved. |

Data Science requires understanding of data, business problem and ML techniques• Specialist language & frameworks• Inherently iterative, with lots of trial & error

Model Training step is where actual ML algorithms come in• Requires significant compute resources

The Machine Learning Process

Data Preparation

Feature Engineering

Model Selection

Model Training

Parameter Tuning

Model Deployment

Data Acquisition

Page 12: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Challenges for Application Developers

• Increased demand for AI & predictive models

• Steep learning curve for ML

• Operationalization of ML models

Page 13: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

13 | © InterSystems Corporation. All rights reserved. |

Can we automatically test all options the data scientist would choose from?• Make educated guesses to limit the options• Just add more compute resources

Select from candidate models based on predefined metrics• Optimize for accuracy or (runtime) speed

Automating The Machine Learning Process

Data Preparation

Feature Engineering

Model Selection

Model Training

Parameter Tuning

Model Deployment

Data Acquisition AutoML

Page 14: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Challenges for Application Developers

• Increased demand for AI & predictive models

• Steep learning curve for ML

• Operationalization of ML models

Page 15: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Introducing IntegratedML

Introducing IntegratedML

Page 16: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

16 | © InterSystems Corporation. All rights reserved. |

All-SQL environment• No need to learn ML methodology or frameworks• Simple DDL and query functions

Turnkey tool - AutoML• Includes automated feature engineering & model selection• Single interface for multiple AutoML “providers”

Easy Operationalization• Universal interface for model invocation• Takes care of all the plumbing

IntegratedML | Overview

Page 17: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

17 | © InterSystems Corporation. All rights reserved. |

ML Automation suites that take care of• Feature Engineering null imputation, one-hot-encoding, date transformations, …• Model Selection based on target & input field types• Model Building actual ML algorithms• Parameter Tuning play-rinse-repeat

IntegratedML will package multiple AutoML engines users can choose from• Homegrown implementation• Open source packages• Technology partnerships• Extensible through gateway infrastructure

IntegratedML | Providers

Page 18: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

18 | © InterSystems Corporation. All rights reserved. |

IntegratedML | Concepts

ModelProblem definition

• Predicted field• Input fields

Training RunModel build activity

• AutoML Provider• Training data (ref)• Training logs

Trained ModelRunnable model

• AutoML Provider• Model info

Validation MetricModel quality metric

• Metric type & value

Validation RunModel quality testing

• Validation data (ref)

Page 19: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

19 | © InterSystems Corporation. All rights reserved. |

IntegratedML | Concepts

* *

*

*

ModelProblem definition

• Predicted field• Input fields

Training RunModel build activity

• AutoML Provider• Training data (ref)• Training logs

Trained ModelRunnable model

• AutoML Provider• Model info

Validation MetricModel quality metric

• Metric type & value

Validation RunModel quality testing

• Validation data (ref)

Page 20: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

20 | © InterSystems Corporation. All rights reserved. |

Define models as first-class citizens in IRIS SQL using simple DDL

CREATE MODEL PainAlert PREDICTING (IsAnomaly BOOLEAN)FROM EHR.WardPatient USING {"provider": "H2O"}

CREATE captures problem statement as predicted and input fields and their data types• Either enumerate inputs in WITH clause or use FROM clause• Model is metadata only, not runnable yet• Register defaults through USING clause

Syntax | CREATE MODEL

Model Training Run

Trained Model

Validation Metric

Validation Run

Page 21: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

21 | © InterSystems Corporation. All rights reserved. |

Build runnable models by selecting a dataset on which to train a defined model

TRAIN MODEL PainAlert FROM EHR.WardPatientHistoryWHERE DateAdmitted < '07/01/2019'

TRAIN sends selected data to AutoML provider for actual model building• ML work happens in the background, Training Run object tracks progress• Existing and new Trained Models are retained to support ModelOps

Optionally refine ML process through USING clause

Syntax | TRAIN MODEL

Model Training Run

Trained Model

Validation Metric

Validation Run

Page 22: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

22 | © InterSystems Corporation. All rights reserved. |

Get predictions, probabilities and accuracy through straightforward SQL functions

SELECT PREDICT(PainAlert), EpisodeID, PatientIDFROM EHR.WardPatients

Automatically maps model inputs to columns included in the FROM clause• Optionally refine inputs through WITH clause• Leverages latest / default Trained Model for PainAlert• Operationalization can’t get much easier

Use PROBABILITY() for additional detail

Syntax | PREDICT()

Model Training Run

Trained Model

Validation Metric

Validation Run

Page 23: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

23 | © InterSystems Corporation. All rights reserved. |

Capture model accuracy metrics

VALIDATE MODEL PainAlert FROM EHR.NewPatients

Transparently segregates training and test sets (configurable)• Actual metrics are dependent on model type

― e.g. precision, recall and F-Measure for categorization models

Test results available for querying from %ML.ValidationMetrics table

Syntax | VALIDATE MODEL

Model Training Run

Trained Model

Validation Metric

Validation Run

Page 24: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

25 | © InterSystems Corporation. All rights reserved. |

Simplest Syntax

Perfect Plumbing

Intelligence Inside

IntegratedML | Summary

Page 25: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

26 | © InterSystems Corporation. All rights reserved. |

Google BigQuery MLWITH game_to_predict AS (SELECT * FROM bqml_tutorial.wide_gamesWHERE game_id='f1063e80-23c7-486b' )

SELECTtruth.game_id AS game_id,total_three_points_att,predicted_total_three_points_att

FROM (SELECTgame_id,predicted_label AS

predicted_total_three_points_attFROMML.PREDICT(MODEL 'bqml_tutorial.ncaa_model’ ,TABLE game_to_predict) ) AS predict

JOIN(SELECT game_id, total_three_points_attFROM game_to_predict) AS truth

ON predict.game_id = truth.game_id

IntegratedMLSELECTgame_id, total_three_points_att, PREDICT(ncaa_model) ASpredicted_total_three_points_att

FROM game_to_predict

Differentiator | SQL Syntax

Page 26: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Demo

Introducing IntegratedML

Page 27: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

The Belgian Connection

Introducing IntegratedML

Page 28: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python
Page 29: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

30 | © InterSystems Corporation. All rights reserved. |

Natural Language Processing & ML

NLP for Machine LearningLeverage NLP for Feature Engineering to turn text input into numeric features• Bag-of-words & TFIDF• POS & other statistics• Word embeddings (usage)

Examples• Scikit-learn: CountVectorizer, TfIdfVectorizer• iKnow: Text Categorization fwk• IRIS & IntegratedML: NLP-Fx

Machine Learning for NLPLeverage ML to build NLP tools by trainingthe NLP tool on a reference corpus• Sentiment detection• Word embeddings (training)

Examples• SpaCy• BERT• …

Page 30: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Analytics with InterSystems IRIS

Introducing IntegratedML

Page 31: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

34 | © InterSystems Corporation. All rights reserved. |

Information Portal Analytics Workbench Data Science Lab AI Hub

The Analytics Landscape – Activities

• Dashboards

• Reporting

• Decision Support

• Data Exploration

• Ad-hoc Analysis

• Data Engineering

• Machine Learning

• Data Science

• Advanced Analytics

• Model Deployment

• Artificial Intelligence

• Augmented Apps

Page 32: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

35 | © InterSystems Corporation. All rights reserved. |

Information Portal Analytics Workbench Data Science Lab AI Hub

Role

sThe Analytics Landscape – Roles

• Dashboards

• Reporting

• Decision Support

• Data Exploration

• Ad-hoc Analysis

• Data Engineering

• Machine Learning

• Data Science

• Advanced Analytics

• Model Deployment

• Artificial Intelligence

• Augmented Apps

Full-StackDeveloper

DataScientistManager Business

AnalystData

ModelerData

EngineerDataGeek

EnterpriseArchitect

DevOpsEngineer

DataModeler

DataGeek

EnterpriseArchitect

Page 33: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

36 | © InterSystems Corporation. All rights reserved. |

Information Portal Analytics Workbench Data Science Lab AI Hub

Role

sThe Analytics Landscape – Data

Dat

a

• Dashboards

• Reporting

• Decision Support

• Data Exploration

• Ad-hoc Analysis

• Data Engineering

• Machine Learning

• Data Science

• Advanced Analytics

• Model Deployment

• Artificial Intelligence

• Augmented Apps

Full-StackDeveloper

DataScientistManager Business

AnalystData

ModelerData

EngineerDataGeek

EnterpriseArchitect

DevOpsEngineer

DataModeler

DataGeek

EnterpriseArchitect

OLAP Columnar

Time SeriesRelational JSONSensor

Streaming

Page 34: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

37 | © InterSystems Corporation. All rights reserved. |

Information Portal Analytics Workbench Data Science Lab AI Hub

Exte

rnal

Embe

dded

Analytics with InterSystems IRIS

NLPIntegratedML

AI Connectors

PMML Support

Analytic Workflows

BI Connectors

ML ToolkitSpark Connector

GatewaysUIMA

NLP-Fx

Available

Imminent

Roadmap

Reporting

Dat

a OLAP

Search Interoperability

Columnar

Time SeriesRelational JSONSensor

BI

Streaming

Role

s

Full-StackDeveloper

DataScientistManager Business

AnalystData

ModelerData

EngineerDataGeek

EnterpriseArchitect

DevOpsEngineer

DataModeler

DataGeek

EnterpriseArchitect

Page 35: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

38 | © InterSystems Corporation. All rights reserved. |

Information Portal Analytics Workbench Data Science Lab AI Hub

Exte

rnal

Embe

dded

Analytics with InterSystems IRIS

NLPIntegratedML

AI Connectors

PMML Support

Analytic Workflows

BI Connectors

ML ToolkitSpark Connector

GatewaysUIMA

NLP-Fx

Available

Imminent

Roadmap

Reporting

Dat

a OLAP

Search Interoperability

Columnar

Time SeriesRelational JSONSensor

BI

Streaming

Role

s

Full-StackDeveloper

DataScientistManager Business

AnalystData

ModelerData

EngineerDataGeek

EnterpriseArchitect

DevOpsEngineer

DataModeler

DataGeek

EnterpriseArchitect

Open Analytics Platform

Fastest Path to Possible

All. Your. Data.

Page 36: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

39 | © InterSystems Corporation. All rights reserved. |

Information Portal Analytics Workbench Data Science Lab AI Hub

Exte

rnal

Embe

dded

Analytics with InterSystems IRISD

ata

Your Analytics: Our Open Analytics Platform strategy allows leveraging tools your analysts, engineers and scientists know and love at every slice of the spectrum.

Our Speed: Embedded technologies, optimized connectors and pre-canned integrations enable the Fastest Path to Possible, both in system performance and user productivity.

Your Data: A proven & fast multi-model database at the core of the platform ensures all of your enterprise data can be stored in its natural form, maximizing the potential for insights

Role

s

Full-StackDeveloper

DataScientistManager Business

AnalystData

ModelerData

EngineerDataGeek

EnterpriseArchitect

DevOpsEngineer

DataModeler

DataGeek

EnterpriseArchitect

Page 37: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Use Case: Augmented Transactions

Operationalization

Data Scientists know which tools to use to build accurate fraud detection models, but such exotic software may not fit in a production environment. At the same time, portability should not be an excuse to sacrifice performance.

Page 38: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

41 | © InterSystems Corporation. All rights reserved. |

PMML Runtime | Overview

Page 39: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

42 | © InterSystems Corporation. All rights reserved. |

Leverage PMML for Model Deployment• Data scientists can stick to their preferred technology• Simple APIs to load and invoke PMML models• Optimized ObjectScript code generation• Save directly from SparkML with extension function

PMML Runtime | Usage

import com.intersystems.spark._val training = spark.read.iris("SELECT hasSepsis, f1, f2, f3 FROM Patient")val formula = new RFormula().setFormula("hasSepsis ~ f1 + f2 + f3")val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001)val pipeline = new Pipeline().setStages(Array(formula, lr))val model = pipeline.fit(training)

import com.intersystems.spark.ml._model.iscSave("Models.SepsisModel", training.schema)

Page 40: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

43 | © InterSystems Corporation. All rights reserved. |

Leverage PMML for Model Deployment• Data scientists can still stick to their preferred technology

Easily operationalize for use in SQL with IntegratedML:

PMML Runtime | with IntegratedML

CREATE MODEL FromSKLearn PREDICTING (Species)FROM DataMining.IrisDataset USING { "provider": "PMML" }

TRAIN MODEL FromSKLearnUSING { "pmml_file": "/tmp/sklearn-export.pmml" }

VALIDATE MODEL FromSKLearn

SELECT PREDICT(FromSKLearn), Species FROM DataMining.IrisDataset

Page 41: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Use Case: Machine Learning Agility

The problem for Doug

Despite high versatility for mathematical modelling, languages like Python and R lack a strong orchestration capability tying modelling work into business processes.

This includes:• initiating modelling tasks based on business events• more direct interaction with source data• adding process management to script execution

Page 42: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

45 | © InterSystems Corporation. All rights reserved. |

Available from • Easily embed ML code into IRIS-based

applications and workflows

• Offers native connectivity to Python & R for InterSystems IRIS― Core dev: ObjectScript API― Business Process dev: Interoperability

Adapters

• Includes exhaustive set of fully-documented showcases implementing concrete use cases for ML

• Emerging community with regular webinars and demos

ML Toolkit | Project Overview

Page 43: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Wrapping up

Introducing IntegratedML

Page 44: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Challenges for Application Developers

• Increased demand for AI & predictive models

• Steep learning curve for ML

• Operationalization of ML models

Page 45: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

50 | © InterSystems Corporation. All rights reserved. |

Simplest Syntax

Perfect Plumbing

Intelligence Inside

IntegratedML | Summary

Page 46: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

IntegratedML: Machine Learning for Developers

Productivity

Page 47: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

IntegratedML: Machine Learning for Developers

Page 48: QuickML: Machine Learning for Developers...2020/02/11  · Machine Learning Agility The problem for Doug Despite high versatility for mathematical modelling, languages like Python

Thank You.