intro to machine learning with h2o and python - denver

11
H 2 O.ai Machine Intelligence Robust, Tested and Supported Platform for Predictive Analytics

Upload: srisatish-ambati

Post on 06-Jan-2017

1.599 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Intro to Machine Learning with H2O and Python - Denver

H2O.ai Machine Intelligence

Robust, Tested and Supported Platform for Predictive Analytics

Page 2: Intro to Machine Learning with H2O and Python - Denver

•  Founded:2011:Version3releasedin2015•  Product:H2Oopensourcein-memorypredic=onengine•  Team:50+Coredevelopersanddatascien=sts•  HQ:MountainView,CA.Sales:U.S.,U.K.&Canada

H2O.ai Overview

H2O.ai Machine Intelligence

Page 3: Intro to Machine Learning with H2O and Python - Denver

Page 4: Intro to Machine Learning with H2O and Python - Denver

What is H2O?Opensourcein-memorypredic=onenginePlaMorm

• Parallelizedanddistributedalgorithmsmakingthemostuseoutofmul=threadedsystemsandgrids

• GLM,RandomForest,GBM,DeepLearning(ANN),GLRM,PCA,K-meansetc.

Data,PlaMormandClientAgnos=cAccess• Runsanywhere• RESTAPI–drivesH2OfromR,Python,WebUI,Excel,Tableau• Scorecodeforallmodels

Samecode.DifferentEnvironmentsScale• Useallofyourdatawithoutsubsebng• Nocodechangestogofromdevelopmenttoproduc=on

Single source of truth for R and Python users

Page 5: Intro to Machine Learning with H2O and Python - Denver

Ensembles

Deep Neural Networks

Algorithms on H2O

•  Generalized Linear Models with L1 and L2 Penalties: Binomial, Gaussian, Gamma, Poisson and Tweedie

•  Naïve Bayes

•  Distributed Random Forest: Classification or regression models •  Gradient Boosting Machine: Produces an ensemble of

decision trees with increasing refined approximations

•  Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Supervised Learning

Statistical Analysis

Page 6: Intro to Machine Learning with H2O and Python - Denver

Dimensionality Reduction

Anomaly Detection

Algorithms on H2O

•  K-means: Partitions observations into k clusters/groups of the same spatial size

•  Principal Component Analysis: Linearly transforms correlated variables to independent components

•  Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

•  Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

Unsupervised Learning

Clustering

Page 7: Intro to Machine Learning with H2O and Python - Denver

DataandClientAgnos/c

HDFS

S3

SQL

NoSQL

Classifica=onRegression

FeatureEngineering

In-Memory

MapReduce/ForkJoin

ColumnarCompression

DeepLearning

PCA,GLM,Cox

RandomForest/GBMEnsembles

H2OComputeEngine

Streaming

JavaScoreCode

MatrixFactoriza=on Clustering

Munging

Page 8: Intro to Machine Learning with H2O and Python - Denver

H2O and R

Page 9: Intro to Machine Learning with H2O and Python - Denver

Reading Data from Disk into H2O with R

STEP 1

R user

h2o_df = h2o.importFile(“Local/path/to/data.csv”)

Page 10: Intro to Machine Learning with H2O and Python - Denver

Reading Data from Disk into H2O with R

Request data from disk

STEP 2

HTTP REST API request to

H2Ohas local file

path

2.2Initiate parallel

ingest

2.3

Disk 2.4

h2o.importFile()

2.1R function

call

Local H2O Instance

H2O

data.csv

Page 11: Intro to Machine Learning with H2O and Python - Denver

Reading Data from Disk into H2O with R

STEP 3

h2o_df object created in R

Disk

Local HostPointer to Data

Return pointer to data in REST API

JSON Response

Disk provides

data

3.3

3.4 3.1

data.csv

h2o_df

3.2Parallelized

H2OFrame in DKV

Local H2O Instance

H2O Frame

H2O