l7. a developers’ overview of the world of predictive apis

66
A developers’ overview of the world of predictive APIs (proprietary, open source, hybrid) Louis Dorard PAPIs.io

Upload: machine-learning-valencia

Post on 18-Jan-2017

464 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: L7. A developers’ overview of the world of predictive APIs

A developers’ overview of the world of predictive APIs (proprietary, open source, hybrid)

Louis DorardPAPIs.io

Page 2: L7. A developers’ overview of the world of predictive APIs

Predictive APIs basics

Types of Predictive APIs

Deploy own model on 3rd party platform

Open source PAPI frameworks

What’s missing / almost there

Page 3: L7. A developers’ overview of the world of predictive APIs

Predictive APIs basics

Page 4: L7. A developers’ overview of the world of predictive APIs
Page 5: L7. A developers’ overview of the world of predictive APIs

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

Page 6: L7. A developers’ overview of the world of predictive APIs

The two methods of predictive APIs:

• TRAIN a model

• PREDICT with a model

Page 7: L7. A developers’ overview of the world of predictive APIs

The two methods of predictive APIs:

• model = create_model(dataset)

• predicted_output = create_prediction(model, new_input)

Page 8: L7. A developers’ overview of the world of predictive APIs

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

Page 9: L7. A developers’ overview of the world of predictive APIs

from bigml.api import BigML

# create a modelapi = BigML()source = api.create_source('training_data.csv')dataset = api.create_dataset(source)model = api.create_model(dataset)

# make a predictionprediction = api.create_prediction(model, new_input)print "Predicted output value: ",prediction['object']['output']

BigML Python

Page 10: L7. A developers’ overview of the world of predictive APIs

# from http://scikit-learn.org/stable/tutorial/basic/tutorial.html

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

Page 11: L7. A developers’ overview of the world of predictive APIs

$ curl https://bigml.io/model?$BIGML_AUTH \ -X POST \ -H "content-type: application/json" \ -d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'

BigML REST http

Page 12: L7. A developers’ overview of the world of predictive APIs

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

Page 13: L7. A developers’ overview of the world of predictive APIs

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

Page 14: L7. A developers’ overview of the world of predictive APIs

Types of Predictive APIs

Page 15: L7. A developers’ overview of the world of predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Page 16: L7. A developers’ overview of the world of predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Page 17: L7. A developers’ overview of the world of predictive APIs

“Is this email important? — Yes/No”

Page 18: L7. A developers’ overview of the world of predictive APIs

Generic Predictive APIs:• Google Prediction API • BigML • Datagami • PredicSis • WolframCloud • Amazon

Page 19: L7. A developers’ overview of the world of predictive APIs

AMAZON GOOGLE PREDICSIS BIGML

ACCURACY 0.862 0.743 0.858 0.790

TRAINING TIME 135s 76s 17s 5s

TEST TIME 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

Page 20: L7. A developers’ overview of the world of predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for textProblem-specific

Fixed-model

Page 21: L7. A developers’ overview of the world of predictive APIs

“Which project does this document relate to? — A/B/C”

Page 22: L7. A developers’ overview of the world of predictive APIs

Text classification APIs:• uClassify.com • MonkeyLearn.com • Cortical.io

Page 23: L7. A developers’ overview of the world of predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Page 24: L7. A developers’ overview of the world of predictive APIs

“Is this customer going to leave next month? — Yes/No”

Page 25: L7. A developers’ overview of the world of predictive APIs

“Is this transaction fraudulent? — Yes/No”

Page 26: L7. A developers’ overview of the world of predictive APIs

Problem-specific Predictive APIs:• Churn: ChurnSpotter.io, Framed.io • Fraud detection: SiftScience.com • Lead scoring: Infer.com • Personal assistant (Siri): Wit.ai

Page 27: L7. A developers’ overview of the world of predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Page 28: L7. A developers’ overview of the world of predictive APIs

“What is the sentiment of this tweet? — Positive/Neutral/Negative”

Page 29: L7. A developers’ overview of the world of predictive APIs

“Is this email spam? — Yes/No”

Page 30: L7. A developers’ overview of the world of predictive APIs

Fixed-model Predictive APIs:• Text: Datumbox.com, Semantria.com • Vision: Indico, AlchemyAPI.com • Siri-like: maluuba.com

Page 31: L7. A developers’ overview of the world of predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Page 32: L7. A developers’ overview of the world of predictive APIs

CU

STOM

IZATION

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Page 33: L7. A developers’ overview of the world of predictive APIs

Deploy your own model on 3rd party platform

Page 34: L7. A developers’ overview of the world of predictive APIs

EXPERIMENT / DEPLOYFind model Use model

Page 35: L7. A developers’ overview of the world of predictive APIs

Start with {popular, open source} ML library

Page 36: L7. A developers’ overview of the world of predictive APIs

Experiment in automated environment

Page 37: L7. A developers’ overview of the world of predictive APIs
Page 38: L7. A developers’ overview of the world of predictive APIs
Page 39: L7. A developers’ overview of the world of predictive APIs
Page 40: L7. A developers’ overview of the world of predictive APIs
Page 41: L7. A developers’ overview of the world of predictive APIs
Page 42: L7. A developers’ overview of the world of predictive APIs

Experiment on “ScienceCluster”• Distributed jobs • Collaborative workspace • Serialize chosen model

Deploy model as API on “ScienceOps”• Load balancing • Auto scaling • Monitoring (API calls, accuracy)

Page 43: L7. A developers’ overview of the world of predictive APIs

43

• 1 for serving predictions

• 1 for running ML experiment (i.e. train and evaluate models on given data)?

• 1 for deploying ML models?

Your API endpoints

Page 44: L7. A developers’ overview of the world of predictive APIs

Open source PAPI frameworks

Page 45: L7. A developers’ overview of the world of predictive APIs
Page 46: L7. A developers’ overview of the world of predictive APIs

• “Open source prediction server” • Based on Spark • Expose predictive models as (scalable

& robust) APIs

Page 47: L7. A developers’ overview of the world of predictive APIs

MLlib

Page 48: L7. A developers’ overview of the world of predictive APIs
Page 49: L7. A developers’ overview of the world of predictive APIs

= PAPI+

Page 50: L7. A developers’ overview of the world of predictive APIs

• Train command: “pio train” • API endpoints:

• Send (new) data • Send prediction queries

Page 51: L7. A developers’ overview of the world of predictive APIs
Page 52: L7. A developers’ overview of the world of predictive APIs

Deployment • Amazon CloudFormation template→ cluster

• Manual up/down scaling

Page 53: L7. A developers’ overview of the world of predictive APIs
Page 54: L7. A developers’ overview of the world of predictive APIs
Page 55: L7. A developers’ overview of the world of predictive APIs
Page 56: L7. A developers’ overview of the world of predictive APIs

What’s missing /almost there

Page 57: L7. A developers’ overview of the world of predictive APIs

57

• Spearmint: “Bayesian optimization” for tuning parameters → Whetlab → Twitter

• Auto-sklearn: “automated machine learning toolkit and drop-in replacement for a scikit-learn estimator”

• See automl.org and challenge

Open Source AutoML?!

Page 58: L7. A developers’ overview of the world of predictive APIs

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

Page 59: L7. A developers’ overview of the world of predictive APIs

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

Page 60: L7. A developers’ overview of the world of predictive APIs

import autosklearnmodel = autosklearn.AutoSklearnClassifier()

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

AutoML Scikit

Page 61: L7. A developers’ overview of the world of predictive APIs

61

• ?

Open Source Auto Scaling?!

Page 62: L7. A developers’ overview of the world of predictive APIs

AMAZON GOOGLE PREDICSIS BIGML

ACCURACY 0.862 0.743 0.858 0.790

TRAINING TIME 135s 76s 17s 5s

TEST TIME 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

Page 63: L7. A developers’ overview of the world of predictive APIs

63

• Requirement:

• train/test splits on local machine

• compute evaluation on local machine

• Solutions

• adapt bigmler and use local evaluations?

• use scikit-learn framework?

Automated Benchmark?!

Page 64: L7. A developers’ overview of the world of predictive APIs

64

• Python defacto standard: scikit-learn

• “Sparkit-learn aims to provide scikit-learn functionality and API on PySpark. The main goal of the library is to create an API that stays close to sklearn’s."

• REST standard: PSI (Protocols & Structures for Inference)

• Pretty similar to BigML API!

• Implementation for scikit available

• Easy benchmarking! Ensembles!

API standards?!

Page 65: L7. A developers’ overview of the world of predictive APIs

65

• VM with Jupyter notebooks (Python & Bash)

• API wrappers preinstalled: BigML & Google Pred

• Notebook for easy setup of credentials

• Scikit-learn and Pandas preinstalled

• Open source VM provisioning script & notebooks

• Search public Snaps on terminal.com: “machine learning”

Getting started

Page 66: L7. A developers’ overview of the world of predictive APIs

@louisdorard

@papisdotio

louisdorard.com

papis.io