h2o platform workshop
DESCRIPTION
H2O Platform Workshop with ShareThis data science teamTRANSCRIPT
H2O Workshop
Hassan NamarvarPrincipal Data Scientist
Oct 8, 2014
2
WHY USE A NEW MACHINE LEARNING TOOL?
Available large-scale ML tools such as Apache Mahout, Vowpal Wabbit, Hadoop RMR, native Spark MLLib have their own issues.
Critical Features for state-of-the-art ML package:
Ease of use
System reliability
In-memory (fast)
Distributed
Extensible (API/SDK)
Accurate algorithms
Visualization (data and results)
…
3
INTRODUCTION TO H2O PLATFORM
H2O is the world’s fastest in-memory open source machine learning library.
Important Features:
Open source licensed under Apache
Scalable in-memory processing for big data (written in Java)
Run on one node or multi-node cluster
High quality implementation of state-of-the-art ML libraries
H2O package for R
Spark+H2O = Sparkling Water
4
WORKSHOP AGENDA
Download the bleeding edge version of platform!
Tutorial on Web API
Upload a real dataset into the platform
Build a CPA model using GLM algorithm
Validate the CPA Model on test set
Build more advanced models: GBMs (Gradient Boost Models) BigData Random Forest Deep Learning Neural Networks
Model selection
5
LET’S DO SOME HACKING!
Download the bleeding edge version of platform from:
http://0xdata.com/download/
Run locally:
cd ~/Downloadsunzip h2o-2.7.0.1533.zipcd h2o-2.7.0.1533.zipjava –Xmx4g –jar h2o.jar
Point your browser to:
http://localhost:54321
6
BUILDING A CPA MODEL RETARGETED VISITS AS A PROXY FOR CONVERSIONS
USER-CENTRIC
Focus on RT Users
Deliver Ads at the optimal times
BETTERPERFORMAN
CELeverage
optimization opportunities
OPTIMAL TIME
Target Users Who Likely Convert
DON’T WASTE IMP.
7
GLM MODEL
Screen shot for the CPA model using the GLM algorithm.
8
GBM MODEL
Screen shot for the CPA model using the GBM algorithm.
9
BigData Random Forest MODEL
Screen shot for the CPA model using the RF algorithm.
10
MODEL COMPARISON
Comparing AUC plots of GLM, GBM and RF models on test data:
11
LIVE TEST ON A CAR INSURANCE CAMPAIGNTESTED FOR TWO MONTHS AND MEASURED THE PERFORMANCE BY DFA.
The CPA test for a car Insurance campaign showed 58% improvement on eCPA and 57% on conversion rate (CVR).
THANK YOU!