api, whizzml and apps
Post on 15-Apr-2017
233 Views
Preview:
TRANSCRIPT
BigML, Inc 1
Automation
Poul Petersen @pejpgrep CIO, BigML, Inc @bigmlcom
API, WhizzML and Predictive Applications
BigML, Inc 2ML Crash Course - API/WhizzML/Predictive Apps
BigML ArchitectureTools
REST API
Distributed Machine Learning Backend
Source Server
Dataset Server
Model Server
Prediction Server
Sample Server
WhizzML Server
Evaluation Server
Web-based Frontend
Visualizations
Smart Infrastructure (auto-deployable, auto-scalable)
BigML, Inc 3ML Crash Course - API/WhizzML/Predictive Apps
The Need for a ML API
• Workflow Automation - reduce drudgery
• Abstraction - reuse code
• Composability - powerful combinations of APIs
• Integration - Dashboard or UI component
• Automate deployment
• Repeatable results
BigML, Inc 4ML Crash Course - API/WhizzML/Predictive Apps
Predictive Applications
Collect & Format
Data
Define ML
Problem
ETL
Model & Evaluate
no
yes
Explore
Collect & Format
DataModel
Automate
Consume & Monitor
Predict Score Label
Drift & Anomaly
featureengineer
NotPossible
tunealgorithm
Goal Met?
BigML, Inc 5ML Crash Course - API/WhizzML/Predictive Apps
BigML API Endpoint
https://bigml.io/ / /{id}?{auth}
sourcedatasetmodel
ensembleprediction
batchpredictionevaluation
…
andromedadev
dev/andromeda
• Path elements: • /andromeda specifies the API version (optional) • /dev specifies development mode • if not specified, then latest API in production mode
• {id} is required for PUT and DELETE • {auth} contains url parameters username and api_key • api_key can be an alternative key
BigML, Inc 6ML Crash Course - API/WhizzML/Predictive Apps
BigML API Endpoint
https://bigml.io/...{JSON} {JSON}
Operation HTTP Method Semantics
CREATE POST Creates a new resource. Returns a JSON document including a unique identifier.
RETRIEVE GET Retrieves either a specific resource or a list of resources.
UPDATE PUT Updates a resource. Only certain fields are putable.
DELETE DELETE Deletes a resource
BigML, Inc 7ML Crash Course - API/WhizzML/Predictive Apps
BigML Bindingshttps://github.com/bigmlcom/io
BigML, Inc 8ML Crash Course - API/WhizzML/Predictive Apps
Python Binding OverviewOperation HTTP Method Binding Method
CREATE POST api.create_<resource>(from, {opts})
RETRIEVE GET api.get_<resource>(id, {opts}) api.list_<resource>({opts})
UPDATE PUT api.update_<resource>(id, {opts})
DELETE DELETE api.delete_<resource>(id)
• Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc • id is a resource identifier or resource dict • from is a resource identifier, dict, or string depending on context
BigML, Inc 9ML Crash Course - API/WhizzML/Predictive Apps
Diabetes Anomalies
DIABETES SOURCE
DIABETES DATASET
TRAIN SET
TEST SET
ALL MODEL
CLEAN DATASET
FILTER
ALL MODEL
ALL EVALUATION
CLEAN EVALUATION
COMPARE EVALUATIONS
ANAOMALY DETECTOR
BigML, Inc 10
BigML, Inc 11ML Crash Course - API/WhizzML/Predictive Apps
WhizzML
• Complete programming language
• Machine Learning operations are first-class citizens
• Server-side execution abstracts infrastructure
• API First! - Everything is composable
• Shareable
A Domain-Specific Language (DSL) for automating Machine Learning workflows.
BigML, Inc 12ML Crash Course - API/WhizzML/Predictive Apps
WhizzML vs APIWhizzML API / Bindings
Executes server-‐side
Zero latency
Paralleliza?on built-‐in
Sharing built-‐in
Code agnos?c workflows
Workflows can be UI integrated
Requires local execu?on
Every API call has latency
Manual paralleliza?on
Manual sharing
Code specific workflows
Workflows external to UI
BigML, Inc 13ML Crash Course - API/WhizzML/Predictive Apps
WhizzML vs FlatlineWhizzML Flatline
Concerned with resources
Turing complete
Op?mized for paralleliza?on
Concerned with datasets
More specific to features
Op?mized for speed
BigML, Inc 14ML Crash Course - API/WhizzML/Predictive Apps
Simple Workflow
SOURCE DATASET MODEL
BigML, Inc 15ML Crash Course - API/WhizzML/Predictive Apps
Redfin Workflow
Model Predicts
Sale PriceSold
HomesCompare
List to Prediction
BigML, Inc 16ML Crash Course - API/WhizzML/Predictive Apps
Redfin Workflow
MODEL
FILTERSOLD HOMES
BATCH PREDICTION
NEW FEATURES
DATASET DEALS DATASET
FILTERFORSALE HOMES NEW FEATURES
BigML, Inc 17ML Crash Course - API/WhizzML/Predictive Apps
WhizzML Resources
LIBRARY
CITY 1 SOLD HOMES
CITY 1 DEALS DATASET
EXECUTION
CITY 1 FORSALE HOMES
SCRIPT
BigML, Inc 18ML Crash Course - API/WhizzML/Predictive Apps
WhizzML Resources
LIBRARY
CITY 2 SOLD HOMES
CITY 2 DEALS DATASET
EXECUTION
CITY 2 FORSALE HOMES
SCRIPT
BigML, Inc 19ML Crash Course - API/WhizzML/Predictive Apps
Scriptify
• "Reifies" a resource into a WhizzML script.
• Rapid prototyping meets automation.
BigML, Inc 20ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
Worth More
Worth Less
BigML, Inc 21ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
LATITUDE LONGITUDE REFERENCE LATITUDE
REFERENCELONGITUDE
44.583 -123.296775 44.5638 -123.2794
44.604414 -123.296129 44.5638 -123.2794
44.600108 -123.29707 44.5638 -123.2794
44.603077 -123.295004 44.5638 -123.2794
44.589587 -123.301154 44.5638 -123.2794
Distance (m)
700
30.4
19.38
37.8
23.39
Flatline!
BigML, Inc 22ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FE
https://en.wikipedia.org/wiki/Haversine_formula
BigML, Inc 23ML Crash Course - API/WhizzML/Predictive Apps
WhizML FE
LIBRARY
SCRIPT
Haversine
BigML, Inc 24ML Crash Course - API/WhizzML/Predictive Apps
WhizzML FEFix Missing Values in a “Meaningful” Way
Filter Zeros
Model insulin
Predict insulin
Select insulin
FixedDataset
AmendedDataset
OriginalDataset
CleanDataset
BigML, Inc 25ML Crash Course - API/WhizzML/Predictive Apps
WhizzML Workflow Types
Op?miza?onModel or Ensemble
Best-‐First Features
SMACdown
AlgorithmsStacked Generaliza?on
Gradient boos?ng
Cross Valida?on
Transforma?onsFlatline Wrappers
Remove Anomalies
Domain SpecificApplica?on Workflow
Repe??ve Tasks
BigML, Inc 26ML Crash Course - API/WhizzML/Predictive Apps
Best-First Features{F1}
CHOOSE BEST S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST S = {Fa, Fb, Fc}
BigML, Inc 27ML Crash Course - API/WhizzML/Predictive Apps
Model Selection
ENSEMBLE LOGISTIC REGRESSION
EVALUATION
SOURCE DATASET
TRAINING
TEST
MODEL
EVALUATIONEVALUATION
CHOOSE
BigML, Inc 28ML Crash Course - API/WhizzML/Predictive Apps
Model Tuning
ENSEMBLE N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE N=10
ENSEMBLE N=1000
CHOOSE
BigML, Inc 29ML Crash Course - API/WhizzML/Predictive Apps
SMACdown
• How many models? • How many nodes? • Missing splits or not? • Number of random candidates? • Balance the objective?
SMACdown can tell you!
BigML, Inc 30ML Crash Course - API/WhizzML/Predictive Apps
Path to Automatic ML
time
Auto
mat
ion
REST API
Programmable Infrastructure
A
Sauron • Automatic deployment and auto-‐scaling
Data Generation and Filtering
C
Flatline • DSL for transformation and new field generation
B
Wintermute • Distributed Machine Learning Framework
2011 Spring 2016
Automatic Model Selection
E
SMACdown • Automatic parameter optimization
Workflow Automation
D
WhizzML • DSL for programmable workflows
BigML, Inc 31ML Crash Course - API/WhizzML/Predictive Apps
Higher Level Algorithms
• Stacked Generalization
• Boosting
• Adaboost
• Logitboost
• Martingale Boosting
• Gradient Boosting
BigML, Inc 32ML Crash Course - API/WhizzML/Predictive Apps
Stacked Generalization
ENSEMBLE LOGISTIC REGRESSION
SOURCE DATASET
MODEL
BATCH PREDICTION
BATCH PREDICTION
BATCH PREDICTION
EXTENDED DATASET
EXTENDED DATASET
EXTENDED DATASET
LOGISTIC REGRESSION
BigML, Inc 33ML Crash Course - API/WhizzML/Predictive Apps
Why WhizzML• Automation is critical to fulfilling the promise of ML • WhizzML can create workflows that:
• Automate repetitive tasks. • Automate model tuning and feature selection.
• Combine ML models into more powerful algorithms.
• Create shareable and re-usable executions.
top related