l7. a developers’ overview of the world of predictive apis

A developers’ overview of the world of predictive APIs (proprietary, open source, hybrid)

Louis DorardPAPIs.io

Predictive APIs basics

Types of Predictive APIs

Deploy own model on 3rd party platform

Open source PAPI frameworks

What’s missing / almost there

Predictive APIs basics

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

The two methods of predictive APIs:

• TRAIN a model

• PREDICT with a model


• model = create_model(dataset)

• predicted_output = create_prediction(model, new_input)


• model = create_model(‘training.csv’)


from bigml.api import BigML

# create a modelapi = BigML()source = api.create_source('training_data.csv')dataset = api.create_dataset(source)model = api.create_model(dataset)

# make a predictionprediction = api.create_prediction(model, new_input)print "Predicted output value: ",prediction['object']['output']

BigML Python

# from http://scikit-learn.org/stable/tutorial/basic/tutorial.html

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)

from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])

model.predict(digits.data[-1])

Scikit Python

http://scikit-learn.org/stable/tutorial/basic/tutorial.html

$ curl https://bigml.io/model?$BIGML_AUTH \ -X POST \ -H "content-type: application/json" \ -d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'

BigML REST http


• model = create_model(‘training.csv’)


Types of Predictive APIs

AB

STRA

CTIO

N

Algorithmic Generic

Generic for text Problem-specific

Fixed-model

“Is this email important? — Yes/No”

Generic Predictive APIs:• Google Prediction API • BigML • Datagami • PredicSis • WolframCloud • Amazon

AMAZON GOOGLE PREDICSIS BIGML

ACCURACY 0.862 0.743 0.858 0.790

TRAINING TIME 135s 76s 17s 5s

TEST TIME 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

http://louisdorard.com/blog/machine-learning-apis-comparison

AB

STRA

CTIO

N

Algorithmic Generic

Generic for textProblem-specific

Fixed-model

“Which project does this document relate to? — A/B/C”

Text classification APIs:• uClassify.com • MonkeyLearn.com • Cortical.io

AB

STRA

CTIO

N

Algorithmic Generic


Fixed-model

“Is this customer going to leave next month? — Yes/No”

“Is this transaction fraudulent? — Yes/No”

Problem-specific Predictive APIs:• Churn: ChurnSpotter.io, Framed.io • Fraud detection: SiftScience.com • Lead scoring: Infer.com • Personal assistant (Siri): Wit.ai

AB

STRA

CTIO

N

Algorithmic Generic


Fixed-model

“What is the sentiment of this tweet? — Positive/Neutral/Negative”

“Is this email spam? — Yes/No”

Fixed-model Predictive APIs:• Text: Datumbox.com, Semantria.com • Vision: Indico, AlchemyAPI.com • Siri-like: maluuba.com

http://AlchemyAPI.com

http://maluuba.com

AB

STRA

CTIO

N

Algorithmic Generic


Fixed-model

CU

STOM

IZATION

Algorithmic Generic


Fixed-model

Deploy your own model on 3rd party platform

EXPERIMENT / DEPLOYFind model Use model

Start with {popular, open source} ML library

Experiment in automated environment

Experiment on “ScienceCluster”• Distributed jobs • Collaborative workspace • Serialize chosen model

Deploy model as API on “ScienceOps”• Load balancing • Auto scaling • Monitoring (API calls, accuracy)

43

• 1 for serving predictions

• 1 for running ML experiment (i.e. train and evaluate models on given data)?

• 1 for deploying ML models?

Your API endpoints

Open source PAPI frameworks

• “Open source prediction server” • Based on Spark • Expose predictive models as (scalable

& robust) APIs

= PAPI+

• Train command: “pio train” • API endpoints:

• Send (new) data • Send prediction queries

Deployment • Amazon CloudFormation template→ cluster

• Manual up/down scaling

https://aws.amazon.com/marketplace/pp/B00S74CY0A

What’s missing /almost there

57

• Spearmint: “Bayesian optimization” for tuning parameters → Whetlab → Twitter

• Auto-sklearn: “automated machine learning toolkit and drop-in replacement for a scikit-learn estimator”

• See automl.org and challenge

Open Source AutoML?!

http://automl.org

https://www.codalab.org/competitions/2321

from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)



Scikit Python

import autosklearnmodel = autosklearn.AutoSklearnClassifier()



AutoML Scikit

61

• ?

Open Source Auto Scaling?!

AMAZON GOOGLE PREDICSIS BIGML

ACCURACY 0.862 0.743 0.858 0.790

TRAINING TIME 135s 76s 17s 5s

TEST TIME 188s 369s 5s 1s

louisdorard.com/blog/machine-learning-apis-comparison

http://louisdorard.com/blog/machine-learning-apis-comparison

63

• Requirement:

• train/test splits on local machine

• compute evaluation on local machine

• Solutions

• adapt bigmler and use local evaluations?

• use scikit-learn framework?

Automated Benchmark?!

https://bigmler.readthedocs.org/en/latest/

64

• Python defacto standard: scikit-learn

• “Sparkit-learn aims to provide scikit-learn functionality and API on PySpark. The main goal of the library is to create an API that stays close to sklearn’s."

• REST standard: PSI (Protocols & Structures for Inference)

• Pretty similar to BigML API!

• Implementation for scikit available

• Easy benchmarking! Ensembles!

API standards?!

https://github.com/lensacom/sparkit-learn

http://psi.cecs.anu.edu.au/

65

• VM with Jupyter notebooks (Python & Bash)

• API wrappers preinstalled: BigML & Google Pred

• Notebook for easy setup of credentials

• Scikit-learn and Pandas preinstalled

• Open source VM provisioning script & notebooks

• Search public Snaps on terminal.com: “machine learning”

Getting started

https://tmp35.tmpnb.org/user/8kOCdVk9iIbg/notebooks/featured/pandas-cookbook/cookbook/A%20quick%20tour%20of%20IPython%20Notebook.ipynb

http://github.com/louisdorard/bml-base

http://terminal.com

@louisdorard

@papisdotio

louisdorard.com

papis.io

http://louisdorard.com

http://www.papis.io

l7. a developers’ overview of the world of predictive apis

Data & Analytics