df1 - r - roark - h2o overview
TRANSCRIPT
H2O.aiMachine Intelligence
ML is the new SQLPrediction is the new Search
H2O.aiMachine Intelligence
Company Overview
Company
Product
• Founded: 2011 venture-backed• Team: 40• Distributed Systems Engineers doing Machine
Learning• HQ: Mountain View, CA• Fast, scalable machine and deep learning• Predictive analytics• Open Source Applications
H2O.aiMachine Intelligence
Executive Team
Sri Satish AmbatiCEO & Co-founder
Cliff ClickCTO & Co-founder
Tom KraljevicVP of Engineering
Board of DirectorsJishnu Bhattacharjee // Nexus VenturesAsh Bhardwaj // Flextronics
Scientific Advisory CouncilTrevor HastieStephen BoydRob Tibshirani
DataStax Sun, Java Hotspot Abrizio, Intel Lexalytics
VP of MarketingOleg Rognyskyy
H2O.aiMachine Intelligence
H2O Product Overview
H2O is: Open Source , Distributed, In Memory, Predictive Analytics Platform!
Speed Matters!
No Sampling
Interactive UI
Cutting-Edge Algos
• Time is valuable• In-memory is faster• Intelligence as a service• High speed AND accuracy
• Scale to big data• Access data links• Use all data without sampling
• Online modeling with H2O Flow• Model comparison• R Python Web REST API
• Suite of cutting-edge algorithms• Deep Learning• NanoFast Scoring Engine• Move from model>production extremely fast
H2O.aiMachine Intelligence
Ensembles
Deep Neural Networks
Algorithms on H2O
• Generalized Linear Models : Binomial, Gaussian, Gamma, Poisson and Tweedie
• Cox Proportional Hazards Models• Naïve Bayes • Distributed Random Forest : Classification
or regression models• Gradient Boosting Machine : Produces an
ensemble of decision trees with increasing refined approximations
• Deep learning : Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations
Supervised Learning
Statistical Analysis
H2O.aiMachine Intelligence
Dimensionality Reduction
Anomaly Detection
Algorithms on H2O
• K-means : Partitions observations into k clusters/groups of the same spatial size
• Principal Component Analysis : Linearly transforms correlated variables to independent components
• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning
Unsupervised Learning
Clustering
H2O.aiMachine Intelligence
Accuracy with Speed and Scale
H2O.aiMachine Intelligence
Accuracy with Speed and Scale
HDFS
S3
SQL
NoSQL
ClassificationRegression
Feature Engineering
In-Memory
Map Reduce/Fork Join
Columnar Compression
Deep Learning
PCA, GLM, Cox
Random Forest / GBM Ensembles
Fast Modeling Engine
Streaming
Nano Fast Java Scoring Engines
Matrix Factorization Clustering
Munging
H2O.aiMachine Intelligence
HDFS=DATA
MLlib H2O SQL
H2ORDD
H2O – The Killer-App for Spark
In-Memory Big Data, Columnar
ML 100x faster Algos
R CRAN, API, fast engine
API Spark API, Java MM
Community Devs, Data Science
H2O.aiMachine Intelligence
H2O Flow Interface
H2O.aiMachine Intelligence
125 Meetups15,000 Attendees13,200+ Installations 2,300+ Corporations1st annual H2O World Conference
Adoption and Growth
weeks
H2O.aiMachine Intelligence
H2O’s Install BaseML is the new SQL
103 634 2329
463 2,887 13,237
Companies
Users
Mar 2014 July 2014 Mar 2015
Open Source
135+ MeetupsWord-of-Mouth
Active Users
H2O.aiMachine Intelligence
Hadoop + HDFS
YARN node manager
worker
+Mllib worker
YARN container
Spark executor
Scala main program
Sparkling Water cluster of size 3 on YARN
YARN node manager
worker
+Mllib worker
YARN container
Spark executor
YARN node manager
worker
+Mllib worker
YARN container
Spark executor
client
+Mllib client
Driver
H2O.aiMachine Intelligence
JavaScript R Python Excel/Tableau
Network
Rapids Expression Evaluation Engine Scala Customer
Algorithm
Customer AlgorithmParse
GLMGBMRF
Deep LearningK-Means
PCA
In-H2O Prediction Engine
H2O Software Stack
Fluid Vector Frame
Distributed K/V Store
Non-blocking Hash Map
Job
MRTask
Fork/Join
Flow
Customer Algorithm
Spark Hadoop Standalone H2O
H2O.aiMachine Intelligence
Reading Data from HDFS into H2O with R
STEP 1
R user
h2o_df = h2o.importFile(“hdfs://path/to/data.csv”)
H2O.aiMachine Intelligence
Reading Data from HDFS into H2O with R
H2OH2O
H2O
data.csv
HTTP REST API request to H2Ohas HDFS path
H2O ClusterInitiate distributed
ingest
HDFSRequest
data from HDFS
STEP 22.2
2.3
2.4
R
h2o.importFile()
2.1R function
call
H2O.aiMachine Intelligence
Reading Data from HDFS into H2O with R
H2OH2O
H2O
R
HDFS
STEP 3
Cluster IPCluster Port
Pointer to Data
Return pointer to data in REST
API JSON Response
HDFS provides data
3.3
3.43.1h2o_df object
created in R
data.csv
h2o_df H2OFrame
3.2Distributed
H2OFrame in DKV
H2O Cluster
H2O.aiMachine Intelligence
R Script Starting H2O GLM
HTTP
REST/JSON
.h2o.startModelJob()POST /3/ModelBuilders/glm
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/ModelBuilders/glm endpoint
Job
GLM algorithm
GLM tasks
Fork/Join framework
K/V store framework
H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend
H2O.aiMachine Intelligence
R Script Retrieving H2O GLM Result
HTTP
REST/JSON
h2o.getModel()GET /3/Models/glm_model_id
h2o.glm()
R script
Standard R process
TCP/IP
HTTP
REST/JSON
/3/Models endpoint
Fork/Join framework
K/V store framework
H2O process
Network layer
REST layer
H2O - algos
H2O - core
User process
H2O process
Legend