l7. a developers’ overview of the world of predictive apis

Post on 18-Jan-2017

464 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A developers’ overview of the world of predictive APIs (proprietary, open source, hybrid)

Louis DorardPAPIs.io

Predictive APIs basics

Types of Predictive APIs

Deploy own model on 3rd party platform

Open source PAPI frameworks

What’s missing / almost there

Predictive APIs basics

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

The two methods of predictive APIs:

• TRAIN a model

• PREDICT with a model

The two methods of predictive APIs:

• model = create_model(dataset)

• predicted_output = create_prediction(model, new_input)

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

from bigml.api import BigML

# create a modelapi = BigML()source = api.create_source('training_data.csv')dataset = api.create_dataset(source)model = api.create_model(dataset)

# make a predictionprediction = api.create_prediction(model, new_input)print "Predicted output value: ",prediction['object']['output']

BigML Python

# from http://scikit-learn.org/stable/tutorial/basic/tutorial.html

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

$ curl https://bigml.io/model?$BIGML_AUTH \ -X POST \ -H "content-type: application/json" \ -d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'

BigML REST http

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

The two methods of predictive APIs:

• model = create_model(‘training.csv’)

• predicted_output = create_prediction(model, new_input)

Types of Predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

“Is this email important? — Yes/No”

Generic Predictive APIs:• Google Prediction API • BigML • Datagami • PredicSis • WolframCloud • Amazon

AMAZON GOOGLE PREDICSIS BIGML

ACCURACY 0.862 0.743 0.858 0.790

TRAINING TIME 135s 76s 17s 5s

TEST TIME 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

AB

STRA

CTIO

N

Algorithmic Generic

Generic for textProblem-specific

Fixed-model

“Which project does this document relate to? — A/B/C”

Text classification APIs:• uClassify.com • MonkeyLearn.com • Cortical.io

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

“Is this customer going to leave next month? — Yes/No”

“Is this transaction fraudulent? — Yes/No”

Problem-specific Predictive APIs:• Churn: ChurnSpotter.io, Framed.io • Fraud detection: SiftScience.com • Lead scoring: Infer.com • Personal assistant (Siri): Wit.ai

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

“What is the sentiment of this tweet? — Positive/Neutral/Negative”

“Is this email spam? — Yes/No”

Fixed-model Predictive APIs:• Text: Datumbox.com, Semantria.com • Vision: Indico, AlchemyAPI.com • Siri-like: maluuba.com

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

CU

STOM

IZATION

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

Deploy your own model on 3rd party platform

EXPERIMENT / DEPLOYFind model Use model

Start with {popular, open source} ML library

Experiment in automated environment

Experiment on “ScienceCluster”• Distributed jobs • Collaborative workspace • Serialize chosen model

Deploy model as API on “ScienceOps”• Load balancing • Auto scaling • Monitoring (API calls, accuracy)

43

• 1 for serving predictions

• 1 for running ML experiment (i.e. train and evaluate models on given data)?

• 1 for deploying ML models?

Your API endpoints

Open source PAPI frameworks

• “Open source prediction server” • Based on Spark • Expose predictive models as (scalable

& robust) APIs

MLlib

= PAPI+

• Train command: “pio train” • API endpoints:

• Send (new) data • Send prediction queries

Deployment • Amazon CloudFormation template→ cluster

• Manual up/down scaling

What’s missing /almost there

57

• Spearmint: “Bayesian optimization” for tuning parameters → Whetlab → Twitter

• Auto-sklearn: “automated machine learning toolkit and drop-in replacement for a scikit-learn estimator”

• See automl.org and challenge

Open Source AutoML?!

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

import autosklearnmodel = autosklearn.AutoSklearnClassifier()

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

AutoML Scikit

61

• ?

Open Source Auto Scaling?!

AMAZON GOOGLE PREDICSIS BIGML

ACCURACY 0.862 0.743 0.858 0.790

TRAINING TIME 135s 76s 17s 5s

TEST TIME 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

63

• Requirement:

• train/test splits on local machine

• compute evaluation on local machine

• Solutions

• adapt bigmler and use local evaluations?

• use scikit-learn framework?

Automated Benchmark?!

64

• Python defacto standard: scikit-learn

• “Sparkit-learn aims to provide scikit-learn functionality and API on PySpark. The main goal of the library is to create an API that stays close to sklearn’s."

• REST standard: PSI (Protocols & Structures for Inference)

• Pretty similar to BigML API!

• Implementation for scikit available

• Easy benchmarking! Ensembles!

API standards?!

65

• VM with Jupyter notebooks (Python & Bash)

• API wrappers preinstalled: BigML & Google Pred

• Notebook for easy setup of credentials

• Scikit-learn and Pandas preinstalled

• Open source VM provisioning script & notebooks

• Search public Snaps on terminal.com: “machine learning”

Getting started

@louisdorard

@papisdotio

louisdorard.com

papis.io

top related