l7. a developers’ overview of the world of predictive apis
TRANSCRIPT
A developers’ overview of the world of predictive APIs (proprietary, open source, hybrid)
Louis DorardPAPIs.io
Predictive APIs basics
Types of Predictive APIs
Deploy own model on 3rd party platform
Open source PAPI frameworks
What’s missing / almost there
Predictive APIs basics
The two phases of machine learning:
• TRAIN a model
• PREDICT with a model
The two methods of predictive APIs:
• TRAIN a model
• PREDICT with a model
The two methods of predictive APIs:
• model = create_model(dataset)
• predicted_output = create_prediction(model, new_input)
The two methods of predictive APIs:
• model = create_model(‘training.csv’)
• predicted_output = create_prediction(model, new_input)
from bigml.api import BigML
# create a modelapi = BigML()source = api.create_source('training_data.csv')dataset = api.create_dataset(source)model = api.create_model(dataset)
# make a predictionprediction = api.create_prediction(model, new_input)print "Predicted output value: ",prediction['object']['output']
BigML Python
# from http://scikit-learn.org/stable/tutorial/basic/tutorial.html
from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
Scikit Python
$ curl https://bigml.io/model?$BIGML_AUTH \ -X POST \ -H "content-type: application/json" \ -d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'
BigML REST http
The two methods of predictive APIs:
• model = create_model(‘training.csv’)
• predicted_output = create_prediction(model, new_input)
The two methods of predictive APIs:
• model = create_model(‘training.csv’)
• predicted_output = create_prediction(model, new_input)
Types of Predictive APIs
AB
STRA
CTIO
N
Algorithmic Generic
Generic for text Problem-specific
Fixed-model
AB
STRA
CTIO
N
Algorithmic Generic
Generic for text Problem-specific
Fixed-model
“Is this email important? — Yes/No”
Generic Predictive APIs:• Google Prediction API • BigML • Datagami • PredicSis • WolframCloud • Amazon
AMAZON GOOGLE PREDICSIS BIGML
ACCURACY 0.862 0.743 0.858 0.790
TRAINING TIME 135s 76s 17s 5s
TEST TIME 188s 369s 5s 1s
louisdorard.com/blog/machine-learning-apis-comparison
AB
STRA
CTIO
N
Algorithmic Generic
Generic for textProblem-specific
Fixed-model
“Which project does this document relate to? — A/B/C”
Text classification APIs:• uClassify.com • MonkeyLearn.com • Cortical.io
AB
STRA
CTIO
N
Algorithmic Generic
Generic for text Problem-specific
Fixed-model
“Is this customer going to leave next month? — Yes/No”
“Is this transaction fraudulent? — Yes/No”
Problem-specific Predictive APIs:• Churn: ChurnSpotter.io, Framed.io • Fraud detection: SiftScience.com • Lead scoring: Infer.com • Personal assistant (Siri): Wit.ai
AB
STRA
CTIO
N
Algorithmic Generic
Generic for text Problem-specific
Fixed-model
“What is the sentiment of this tweet? — Positive/Neutral/Negative”
“Is this email spam? — Yes/No”
Fixed-model Predictive APIs:• Text: Datumbox.com, Semantria.com • Vision: Indico, AlchemyAPI.com • Siri-like: maluuba.com
AB
STRA
CTIO
N
Algorithmic Generic
Generic for text Problem-specific
Fixed-model
CU
STOM
IZATION
Algorithmic Generic
Generic for text Problem-specific
Fixed-model
Deploy your own model on 3rd party platform
EXPERIMENT / DEPLOYFind model Use model
Start with {popular, open source} ML library
Experiment in automated environment
Experiment on “ScienceCluster”• Distributed jobs • Collaborative workspace • Serialize chosen model
Deploy model as API on “ScienceOps”• Load balancing • Auto scaling • Monitoring (API calls, accuracy)
43
• 1 for serving predictions
• 1 for running ML experiment (i.e. train and evaluate models on given data)?
• 1 for deploying ML models?
Your API endpoints
Open source PAPI frameworks
• “Open source prediction server” • Based on Spark • Expose predictive models as (scalable
& robust) APIs
MLlib
= PAPI+
• Train command: “pio train” • API endpoints:
• Send (new) data • Send prediction queries
Deployment • Amazon CloudFormation template→ cluster
• Manual up/down scaling
What’s missing /almost there
57
• Spearmint: “Bayesian optimization” for tuning parameters → Whetlab → Twitter
• Auto-sklearn: “automated machine learning toolkit and drop-in replacement for a scikit-learn estimator”
• See automl.org and challenge
Open Source AutoML?!
from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
Scikit Python
from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
Scikit Python
import autosklearnmodel = autosklearn.AutoSklearnClassifier()
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
AutoML Scikit
61
• ?
Open Source Auto Scaling?!
AMAZON GOOGLE PREDICSIS BIGML
ACCURACY 0.862 0.743 0.858 0.790
TRAINING TIME 135s 76s 17s 5s
TEST TIME 188s 369s 5s 1s
louisdorard.com/blog/machine-learning-apis-comparison
63
• Requirement:
• train/test splits on local machine
• compute evaluation on local machine
• Solutions
• adapt bigmler and use local evaluations?
• use scikit-learn framework?
Automated Benchmark?!
64
• Python defacto standard: scikit-learn
• “Sparkit-learn aims to provide scikit-learn functionality and API on PySpark. The main goal of the library is to create an API that stays close to sklearn’s."
• REST standard: PSI (Protocols & Structures for Inference)
• Pretty similar to BigML API!
• Implementation for scikit available
• Easy benchmarking! Ensembles!
API standards?!
65
• VM with Jupyter notebooks (Python & Bash)
• API wrappers preinstalled: BigML & Google Pred
• Notebook for easy setup of credentials
• Scikit-learn and Pandas preinstalled
• Open source VM provisioning script & notebooks
• Search public Snaps on terminal.com: “machine learning”
Getting started