quickml: machine learning for developers...2020/02/11 · machine learning agility the problem for...
TRANSCRIPT
QuickML: Machine Learning for Developers
Benjamin De BoeBENELUX SYMPOSIUM 2020
IntegratedML:Machine Learning for Developers
Benjamin De BoeBENELUX SYMPOSIUM 2020
Outline
Integrated intro to ML
Intro to IntegratedML
The Belgian Connection
Analytics with InterSystems IRIS
Wrapping up
IntegratedML:Machine Learning for Developers
IntegratedML: Machine Learning for Developers
IntegratedML:Machine Learning for Developers
Integrated intro to ML
Introducing IntegratedML
9 | © InterSystems Corporation. All rights reserved. |
Traditional Programming Machine Learning
Quick intro to ML
ComputerData
ProgramOutput Computer
Data
OutputProgram
11 | © InterSystems Corporation. All rights reserved. |
Data Science requires understanding of data, business problem and ML techniques• Specialist language & frameworks• Inherently iterative, with lots of trial & error
Model Training step is where actual ML algorithms come in• Requires significant compute resources
The Machine Learning Process
Data Preparation
Feature Engineering
Model Selection
Model Training
Parameter Tuning
Model Deployment
Data Acquisition
Challenges for Application Developers
• Increased demand for AI & predictive models
• Steep learning curve for ML
• Operationalization of ML models
13 | © InterSystems Corporation. All rights reserved. |
Can we automatically test all options the data scientist would choose from?• Make educated guesses to limit the options• Just add more compute resources
Select from candidate models based on predefined metrics• Optimize for accuracy or (runtime) speed
Automating The Machine Learning Process
Data Preparation
Feature Engineering
Model Selection
Model Training
Parameter Tuning
Model Deployment
Data Acquisition AutoML
Challenges for Application Developers
• Increased demand for AI & predictive models
• Steep learning curve for ML
• Operationalization of ML models
Introducing IntegratedML
Introducing IntegratedML
16 | © InterSystems Corporation. All rights reserved. |
All-SQL environment• No need to learn ML methodology or frameworks• Simple DDL and query functions
Turnkey tool - AutoML• Includes automated feature engineering & model selection• Single interface for multiple AutoML “providers”
Easy Operationalization• Universal interface for model invocation• Takes care of all the plumbing
IntegratedML | Overview
17 | © InterSystems Corporation. All rights reserved. |
ML Automation suites that take care of• Feature Engineering null imputation, one-hot-encoding, date transformations, …• Model Selection based on target & input field types• Model Building actual ML algorithms• Parameter Tuning play-rinse-repeat
IntegratedML will package multiple AutoML engines users can choose from• Homegrown implementation• Open source packages• Technology partnerships• Extensible through gateway infrastructure
IntegratedML | Providers
18 | © InterSystems Corporation. All rights reserved. |
IntegratedML | Concepts
ModelProblem definition
• Predicted field• Input fields
Training RunModel build activity
• AutoML Provider• Training data (ref)• Training logs
Trained ModelRunnable model
• AutoML Provider• Model info
Validation MetricModel quality metric
• Metric type & value
Validation RunModel quality testing
• Validation data (ref)
19 | © InterSystems Corporation. All rights reserved. |
IntegratedML | Concepts
* *
*
*
ModelProblem definition
• Predicted field• Input fields
Training RunModel build activity
• AutoML Provider• Training data (ref)• Training logs
Trained ModelRunnable model
• AutoML Provider• Model info
Validation MetricModel quality metric
• Metric type & value
Validation RunModel quality testing
• Validation data (ref)
20 | © InterSystems Corporation. All rights reserved. |
Define models as first-class citizens in IRIS SQL using simple DDL
CREATE MODEL PainAlert PREDICTING (IsAnomaly BOOLEAN)FROM EHR.WardPatient USING {"provider": "H2O"}
CREATE captures problem statement as predicted and input fields and their data types• Either enumerate inputs in WITH clause or use FROM clause• Model is metadata only, not runnable yet• Register defaults through USING clause
Syntax | CREATE MODEL
Model Training Run
Trained Model
Validation Metric
Validation Run
21 | © InterSystems Corporation. All rights reserved. |
Build runnable models by selecting a dataset on which to train a defined model
TRAIN MODEL PainAlert FROM EHR.WardPatientHistoryWHERE DateAdmitted < '07/01/2019'
TRAIN sends selected data to AutoML provider for actual model building• ML work happens in the background, Training Run object tracks progress• Existing and new Trained Models are retained to support ModelOps
Optionally refine ML process through USING clause
Syntax | TRAIN MODEL
Model Training Run
Trained Model
Validation Metric
Validation Run
22 | © InterSystems Corporation. All rights reserved. |
Get predictions, probabilities and accuracy through straightforward SQL functions
SELECT PREDICT(PainAlert), EpisodeID, PatientIDFROM EHR.WardPatients
Automatically maps model inputs to columns included in the FROM clause• Optionally refine inputs through WITH clause• Leverages latest / default Trained Model for PainAlert• Operationalization can’t get much easier
Use PROBABILITY() for additional detail
Syntax | PREDICT()
Model Training Run
Trained Model
Validation Metric
Validation Run
23 | © InterSystems Corporation. All rights reserved. |
Capture model accuracy metrics
VALIDATE MODEL PainAlert FROM EHR.NewPatients
Transparently segregates training and test sets (configurable)• Actual metrics are dependent on model type
― e.g. precision, recall and F-Measure for categorization models
Test results available for querying from %ML.ValidationMetrics table
Syntax | VALIDATE MODEL
Model Training Run
Trained Model
Validation Metric
Validation Run
25 | © InterSystems Corporation. All rights reserved. |
Simplest Syntax
Perfect Plumbing
Intelligence Inside
IntegratedML | Summary
26 | © InterSystems Corporation. All rights reserved. |
Google BigQuery MLWITH game_to_predict AS (SELECT * FROM bqml_tutorial.wide_gamesWHERE game_id='f1063e80-23c7-486b' )
SELECTtruth.game_id AS game_id,total_three_points_att,predicted_total_three_points_att
FROM (SELECTgame_id,predicted_label AS
predicted_total_three_points_attFROMML.PREDICT(MODEL 'bqml_tutorial.ncaa_model’ ,TABLE game_to_predict) ) AS predict
JOIN(SELECT game_id, total_three_points_attFROM game_to_predict) AS truth
ON predict.game_id = truth.game_id
IntegratedMLSELECTgame_id, total_three_points_att, PREDICT(ncaa_model) ASpredicted_total_three_points_att
FROM game_to_predict
Differentiator | SQL Syntax
Demo
Introducing IntegratedML
The Belgian Connection
Introducing IntegratedML
30 | © InterSystems Corporation. All rights reserved. |
Natural Language Processing & ML
NLP for Machine LearningLeverage NLP for Feature Engineering to turn text input into numeric features• Bag-of-words & TFIDF• POS & other statistics• Word embeddings (usage)
Examples• Scikit-learn: CountVectorizer, TfIdfVectorizer• iKnow: Text Categorization fwk• IRIS & IntegratedML: NLP-Fx
Machine Learning for NLPLeverage ML to build NLP tools by trainingthe NLP tool on a reference corpus• Sentiment detection• Word embeddings (training)
Examples• SpaCy• BERT• …
Analytics with InterSystems IRIS
Introducing IntegratedML
34 | © InterSystems Corporation. All rights reserved. |
Information Portal Analytics Workbench Data Science Lab AI Hub
The Analytics Landscape – Activities
• Dashboards
• Reporting
• Decision Support
• Data Exploration
• Ad-hoc Analysis
• Data Engineering
• Machine Learning
• Data Science
• Advanced Analytics
• Model Deployment
• Artificial Intelligence
• Augmented Apps
35 | © InterSystems Corporation. All rights reserved. |
Information Portal Analytics Workbench Data Science Lab AI Hub
Role
sThe Analytics Landscape – Roles
• Dashboards
• Reporting
• Decision Support
• Data Exploration
• Ad-hoc Analysis
• Data Engineering
• Machine Learning
• Data Science
• Advanced Analytics
• Model Deployment
• Artificial Intelligence
• Augmented Apps
Full-StackDeveloper
DataScientistManager Business
AnalystData
ModelerData
EngineerDataGeek
EnterpriseArchitect
DevOpsEngineer
DataModeler
DataGeek
EnterpriseArchitect
36 | © InterSystems Corporation. All rights reserved. |
Information Portal Analytics Workbench Data Science Lab AI Hub
Role
sThe Analytics Landscape – Data
Dat
a
• Dashboards
• Reporting
• Decision Support
• Data Exploration
• Ad-hoc Analysis
• Data Engineering
• Machine Learning
• Data Science
• Advanced Analytics
• Model Deployment
• Artificial Intelligence
• Augmented Apps
Full-StackDeveloper
DataScientistManager Business
AnalystData
ModelerData
EngineerDataGeek
EnterpriseArchitect
DevOpsEngineer
DataModeler
DataGeek
EnterpriseArchitect
OLAP Columnar
Time SeriesRelational JSONSensor
Streaming
37 | © InterSystems Corporation. All rights reserved. |
Information Portal Analytics Workbench Data Science Lab AI Hub
Exte
rnal
Embe
dded
Analytics with InterSystems IRIS
NLPIntegratedML
AI Connectors
PMML Support
Analytic Workflows
BI Connectors
ML ToolkitSpark Connector
GatewaysUIMA
NLP-Fx
Available
Imminent
Roadmap
Reporting
Dat
a OLAP
Search Interoperability
Columnar
Time SeriesRelational JSONSensor
BI
Streaming
Role
s
Full-StackDeveloper
DataScientistManager Business
AnalystData
ModelerData
EngineerDataGeek
EnterpriseArchitect
DevOpsEngineer
DataModeler
DataGeek
EnterpriseArchitect
38 | © InterSystems Corporation. All rights reserved. |
Information Portal Analytics Workbench Data Science Lab AI Hub
Exte
rnal
Embe
dded
Analytics with InterSystems IRIS
NLPIntegratedML
AI Connectors
PMML Support
Analytic Workflows
BI Connectors
ML ToolkitSpark Connector
GatewaysUIMA
NLP-Fx
Available
Imminent
Roadmap
Reporting
Dat
a OLAP
Search Interoperability
Columnar
Time SeriesRelational JSONSensor
BI
Streaming
Role
s
Full-StackDeveloper
DataScientistManager Business
AnalystData
ModelerData
EngineerDataGeek
EnterpriseArchitect
DevOpsEngineer
DataModeler
DataGeek
EnterpriseArchitect
Open Analytics Platform
Fastest Path to Possible
All. Your. Data.
39 | © InterSystems Corporation. All rights reserved. |
Information Portal Analytics Workbench Data Science Lab AI Hub
Exte
rnal
Embe
dded
Analytics with InterSystems IRISD
ata
Your Analytics: Our Open Analytics Platform strategy allows leveraging tools your analysts, engineers and scientists know and love at every slice of the spectrum.
Our Speed: Embedded technologies, optimized connectors and pre-canned integrations enable the Fastest Path to Possible, both in system performance and user productivity.
Your Data: A proven & fast multi-model database at the core of the platform ensures all of your enterprise data can be stored in its natural form, maximizing the potential for insights
Role
s
Full-StackDeveloper
DataScientistManager Business
AnalystData
ModelerData
EngineerDataGeek
EnterpriseArchitect
DevOpsEngineer
DataModeler
DataGeek
EnterpriseArchitect
Use Case: Augmented Transactions
Operationalization
Data Scientists know which tools to use to build accurate fraud detection models, but such exotic software may not fit in a production environment. At the same time, portability should not be an excuse to sacrifice performance.
41 | © InterSystems Corporation. All rights reserved. |
PMML Runtime | Overview
42 | © InterSystems Corporation. All rights reserved. |
Leverage PMML for Model Deployment• Data scientists can stick to their preferred technology• Simple APIs to load and invoke PMML models• Optimized ObjectScript code generation• Save directly from SparkML with extension function
PMML Runtime | Usage
import com.intersystems.spark._val training = spark.read.iris("SELECT hasSepsis, f1, f2, f3 FROM Patient")val formula = new RFormula().setFormula("hasSepsis ~ f1 + f2 + f3")val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.001)val pipeline = new Pipeline().setStages(Array(formula, lr))val model = pipeline.fit(training)
import com.intersystems.spark.ml._model.iscSave("Models.SepsisModel", training.schema)
43 | © InterSystems Corporation. All rights reserved. |
Leverage PMML for Model Deployment• Data scientists can still stick to their preferred technology
Easily operationalize for use in SQL with IntegratedML:
PMML Runtime | with IntegratedML
CREATE MODEL FromSKLearn PREDICTING (Species)FROM DataMining.IrisDataset USING { "provider": "PMML" }
TRAIN MODEL FromSKLearnUSING { "pmml_file": "/tmp/sklearn-export.pmml" }
VALIDATE MODEL FromSKLearn
SELECT PREDICT(FromSKLearn), Species FROM DataMining.IrisDataset
Use Case: Machine Learning Agility
The problem for Doug
Despite high versatility for mathematical modelling, languages like Python and R lack a strong orchestration capability tying modelling work into business processes.
This includes:• initiating modelling tasks based on business events• more direct interaction with source data• adding process management to script execution
45 | © InterSystems Corporation. All rights reserved. |
Available from • Easily embed ML code into IRIS-based
applications and workflows
• Offers native connectivity to Python & R for InterSystems IRIS― Core dev: ObjectScript API― Business Process dev: Interoperability
Adapters
• Includes exhaustive set of fully-documented showcases implementing concrete use cases for ML
• Emerging community with regular webinars and demos
ML Toolkit | Project Overview
Wrapping up
Introducing IntegratedML
Challenges for Application Developers
• Increased demand for AI & predictive models
• Steep learning curve for ML
• Operationalization of ML models
50 | © InterSystems Corporation. All rights reserved. |
Simplest Syntax
Perfect Plumbing
Intelligence Inside
IntegratedML | Summary
IntegratedML: Machine Learning for Developers
Productivity
IntegratedML: Machine Learning for Developers
Thank You.