api, whizzml and apps

Post on 15-Apr-2017

233 Views

Category:

Data & Analytics

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BigML, Inc 1

Automation

Poul Petersen @pejpgrep CIO, BigML, Inc @bigmlcom

API, WhizzML and Predictive Applications

BigML, Inc 2ML Crash Course - API/WhizzML/Predictive Apps

BigML ArchitectureTools

REST API

Distributed Machine Learning Backend

Source Server

Dataset Server

Model Server

Prediction Server

Sample Server

WhizzML Server

Evaluation Server

Web-based Frontend

Visualizations

Smart Infrastructure (auto-deployable, auto-scalable)

BigML, Inc 3ML Crash Course - API/WhizzML/Predictive Apps

The Need for a ML API

• Workflow Automation - reduce drudgery

• Abstraction - reuse code

• Composability - powerful combinations of APIs

• Integration - Dashboard or UI component

• Automate deployment

• Repeatable results

BigML, Inc 4ML Crash Course - API/WhizzML/Predictive Apps

Predictive Applications

Collect & Format

Data

Define ML

Problem

ETL

Model & Evaluate

no

yes

Explore

Collect & Format

DataModel

Automate

Consume & Monitor

Predict Score Label

Drift & Anomaly

featureengineer

NotPossible

tunealgorithm

Goal Met?

BigML, Inc 5ML Crash Course - API/WhizzML/Predictive Apps

BigML API Endpoint

https://bigml.io/ / /{id}?{auth}

sourcedatasetmodel

ensembleprediction

batchpredictionevaluation

andromedadev

dev/andromeda

• Path elements: • /andromeda specifies the API version (optional) • /dev specifies development mode • if not specified, then latest API in production mode

• {id} is required for PUT and DELETE • {auth} contains url parameters username and api_key • api_key can be an alternative key

BigML, Inc 6ML Crash Course - API/WhizzML/Predictive Apps

BigML API Endpoint

https://bigml.io/...{JSON} {JSON}

Operation HTTP Method Semantics

CREATE POST Creates a new resource. Returns a JSON document including a unique identifier.

RETRIEVE GET Retrieves either a specific resource or a list of resources.

UPDATE PUT Updates a resource. Only certain fields are putable.

DELETE DELETE Deletes a resource

BigML, Inc 7ML Crash Course - API/WhizzML/Predictive Apps

BigML Bindingshttps://github.com/bigmlcom/io

BigML, Inc 8ML Crash Course - API/WhizzML/Predictive Apps

Python Binding OverviewOperation HTTP Method Binding Method

CREATE POST api.create_<resource>(from, {opts})

RETRIEVE GET api.get_<resource>(id, {opts}) api.list_<resource>({opts})

UPDATE PUT api.update_<resource>(id, {opts})

DELETE DELETE api.delete_<resource>(id)

• Where <resource> is one of: source, dataset, model, ensemble, evaluation, etc • id is a resource identifier or resource dict • from is a resource identifier, dict, or string depending on context

BigML, Inc 9ML Crash Course - API/WhizzML/Predictive Apps

Diabetes Anomalies

DIABETES SOURCE

DIABETES DATASET

TRAIN SET

TEST SET

ALL MODEL

CLEAN DATASET

FILTER

ALL MODEL

ALL EVALUATION

CLEAN EVALUATION

COMPARE EVALUATIONS

ANAOMALY DETECTOR

BigML, Inc 10

BigML, Inc 11ML Crash Course - API/WhizzML/Predictive Apps

WhizzML

• Complete programming language

• Machine Learning operations are first-class citizens

• Server-side execution abstracts infrastructure

• API First! - Everything is composable

• Shareable

A Domain-Specific Language (DSL) for automating Machine Learning workflows.

BigML, Inc 12ML Crash Course - API/WhizzML/Predictive Apps

WhizzML vs APIWhizzML API  /  Bindings

Executes  server-­‐side  

Zero  latency  

Paralleliza?on  built-­‐in  

Sharing  built-­‐in  

Code  agnos?c  workflows  

Workflows  can  be  UI  integrated  

Requires  local  execu?on  

Every  API  call  has  latency  

Manual  paralleliza?on  

Manual  sharing  

Code  specific  workflows  

Workflows  external  to  UI

BigML, Inc 13ML Crash Course - API/WhizzML/Predictive Apps

WhizzML vs FlatlineWhizzML Flatline

Concerned  with  resources  

Turing  complete  

Op?mized  for  paralleliza?on

Concerned  with  datasets  

More  specific  to  features  

Op?mized  for  speed

BigML, Inc 14ML Crash Course - API/WhizzML/Predictive Apps

Simple Workflow

SOURCE DATASET MODEL

BigML, Inc 15ML Crash Course - API/WhizzML/Predictive Apps

Redfin Workflow

Model Predicts

Sale PriceSold

HomesCompare

List to Prediction

BigML, Inc 16ML Crash Course - API/WhizzML/Predictive Apps

Redfin Workflow

MODEL

FILTERSOLD HOMES

BATCH PREDICTION

NEW FEATURES

DATASET DEALS DATASET

FILTERFORSALE HOMES NEW FEATURES

BigML, Inc 17ML Crash Course - API/WhizzML/Predictive Apps

WhizzML Resources

LIBRARY

CITY 1 SOLD HOMES

CITY 1 DEALS DATASET

EXECUTION

CITY 1 FORSALE HOMES

SCRIPT

BigML, Inc 18ML Crash Course - API/WhizzML/Predictive Apps

WhizzML Resources

LIBRARY

CITY 2 SOLD HOMES

CITY 2 DEALS DATASET

EXECUTION

CITY 2 FORSALE HOMES

SCRIPT

BigML, Inc 19ML Crash Course - API/WhizzML/Predictive Apps

Scriptify

• "Reifies" a resource into a WhizzML script.

• Rapid prototyping meets automation.

BigML, Inc 20ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FE

Worth More

Worth Less

BigML, Inc 21ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FE

LATITUDE LONGITUDE REFERENCE LATITUDE

REFERENCELONGITUDE

44.583 -123.296775 44.5638 -123.2794

44.604414 -123.296129 44.5638 -123.2794

44.600108 -123.29707 44.5638 -123.2794

44.603077 -123.295004 44.5638 -123.2794

44.589587 -123.301154 44.5638 -123.2794

Distance (m)

700

30.4

19.38

37.8

23.39

Flatline!

BigML, Inc 22ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FE

https://en.wikipedia.org/wiki/Haversine_formula

BigML, Inc 23ML Crash Course - API/WhizzML/Predictive Apps

WhizML FE

LIBRARY

SCRIPT

Haversine

BigML, Inc 24ML Crash Course - API/WhizzML/Predictive Apps

WhizzML FEFix Missing Values in a “Meaningful” Way

Filter Zeros

Model insulin

Predict insulin

Select insulin

FixedDataset

AmendedDataset

OriginalDataset

CleanDataset

BigML, Inc 25ML Crash Course - API/WhizzML/Predictive Apps

WhizzML Workflow Types

Op?miza?onModel  or  Ensemble  

Best-­‐First  Features  

SMACdown

AlgorithmsStacked  Generaliza?on  

Gradient  boos?ng  

Cross  Valida?on  

Transforma?onsFlatline  Wrappers  

Remove  Anomalies

Domain  SpecificApplica?on  Workflow  

Repe??ve  Tasks

BigML, Inc 26ML Crash Course - API/WhizzML/Predictive Apps

Best-First Features{F1}

CHOOSE BEST S = {Fa}

{F2} {F3} {F4} Fn

S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}

CHOOSE BEST S = {Fa, Fb}

S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}

CHOOSE BEST S = {Fa, Fb, Fc}

BigML, Inc 27ML Crash Course - API/WhizzML/Predictive Apps

Model Selection

ENSEMBLE LOGISTIC REGRESSION

EVALUATION

SOURCE DATASET

TRAINING

TEST

MODEL

EVALUATIONEVALUATION

CHOOSE

BigML, Inc 28ML Crash Course - API/WhizzML/Predictive Apps

Model Tuning

ENSEMBLE N=20

EVALUATION

SOURCE DATASET

TRAINING

TEST

EVALUATIONEVALUATION

ENSEMBLE N=10

ENSEMBLE N=1000

CHOOSE

BigML, Inc 29ML Crash Course - API/WhizzML/Predictive Apps

SMACdown

• How many models? • How many nodes? • Missing splits or not? • Number of random candidates? • Balance the objective?

SMACdown can tell you!

BigML, Inc 30ML Crash Course - API/WhizzML/Predictive Apps

Path to Automatic ML

time

Auto

mat

ion

REST  API

Programmable  Infrastructure

A

Sauron  • Automatic  deployment  and  auto-­‐scaling

Data  Generation  and  Filtering

C

Flatline  • DSL  for  transformation  and  new  field  generation

B

Wintermute  • Distributed  Machine  Learning  Framework  

2011 Spring 2016

Automatic  Model  Selection

E

SMACdown    • Automatic  parameter  optimization

Workflow  Automation

D

WhizzML  • DSL  for  programmable  workflows  

BigML, Inc 31ML Crash Course - API/WhizzML/Predictive Apps

Higher Level Algorithms

• Stacked Generalization

• Boosting

• Adaboost

• Logitboost

• Martingale Boosting

• Gradient Boosting

BigML, Inc 32ML Crash Course - API/WhizzML/Predictive Apps

Stacked Generalization

ENSEMBLE LOGISTIC REGRESSION

SOURCE DATASET

MODEL

BATCH PREDICTION

BATCH PREDICTION

BATCH PREDICTION

EXTENDED DATASET

EXTENDED DATASET

EXTENDED DATASET

LOGISTIC REGRESSION

BigML, Inc 33ML Crash Course - API/WhizzML/Predictive Apps

Why WhizzML• Automation is critical to fulfilling the promise of ML • WhizzML can create workflows that:

• Automate repetitive tasks. • Automate model tuning and feature selection.

• Combine ML models into more powerful algorithms.

• Create shareable and re-usable executions.

top related