intro to r and h2o with spencer aiello

15
H 2 O.ai Machine Intelligence Intro to R & H2O By: Spencer Aiello

Upload: sri-ambati

Post on 08-Aug-2015

400 views

Category:

Software


3 download

TRANSCRIPT

H2O.aiMachine Intelligence

Intro to R & H2O

By: Spencer Aiello

Agenda1. short, short history of R2. what is h2o3. getting h2o and reading documentation4. data exploration5. model building

Getting H2O & Docs

1. http://h2o.ai/download/ a. Bleeding Edge (link)b. Install in R (tab)

2. build h2o (https://github.com/h2oai/h2o-3#4-building-h2o-3)

3. http://docs.h2o.ai/ -> H2O 3.0 -> R Users (link) -> R docs (link)

H2O.aiMachine Intelligence

A Brief History of R:

- R first appears 22 years ago (1993)*- Implementation of S (which was created by John Chambers @ Bell Labs)

* Python first appeared 24 years ago (1991)

H2O.aiMachine Intelligence

H2O is what exactly?

Services:

- Interfaces to mainstream data science languages (R, Python, Scala)

- I/O common data formats (CSV, zipped, HDFS, ORC, parquet!?)

- Interface with modern big data infrastructures: Hadoop, Spark, H2O

- Feature-generation capabilities

- High Performance State-of-the-Art Machine Learning Algorithms

H2O.aiMachine Intelligence

H2O is what exactly?

Object Taxonomy in H2O

- H2OFrame: A 2D collection of uniformly typed columns

- H2OModel: An H2O model object

- ID/Key: An identifier for an H2O object

H2O.aiMachine Intelligence

H2O is what exactly?

Feature Generation Capabilities

- > 100 operations to perform on an H2OFrame- Aggregations:

- mean, min, max, sum, or any user-defined reduction- distributed parallel group-by- table, cut

- Simple String manipulation: trim, sub, gsub

- Date Formatting/Extraction: get/set timezones, month, year, dayOfWeek

- Transformations: sqrt, log, *,+, …

- Filtering: R-like slicing

H2O.aiMachine Intelligence

H2O is what exactly?

H2O Modeling

H2O.aiMachine Intelligence

H2O is what exactly?

Infrastructure for:

- KFold Cross-Validation

- Grid Search

- Model Import/Export

H2O.aiMachine Intelligence

H2O is what exactly?

Export Models For Real-Time Scoring:

H2O.aiMachine Intelligence

25,000 commits / 3yrs

H2O World Conference 2014

[email protected]

H2O.aiMachine Intelligence

Driving H2O From R

H2OH2O

H2O

data.csv

HTTP REST API request to H2O

H2O ClusterInitiate distributed

ingest

Some Data Location

Request data

STEP 22.2

2.3

2.4

R

h2o.importFile()

2.1R function

call

H2O.aiMachine Intelligence

Driving H2O From R

H2OH2O

H2O

R

Some data location

STEP 3

Cluster IPCluster Port

Pointer to Data

Return pointer to data in

REST API JSON Response

data provided

3.3

3.43.1h2o_df object

created in Rdata.cs

v

h2o_dfH2O

Frame

3.2Distributed

H2OFrame in DKV

H2O Cluster

H2O.aiMachine Intelligence

R Script Starting H2O GLM

HTTP

REST/JSON

.h2o.startModelJob()POST /3/ModelBuilders/glm

h2o.glm()

R script

Standard R process

TCP/IP

HTTP

REST/JSON

/3/ModelBuilders/glm endpoint

Job

GLM algorithm

GLM tasks Fork/Join

frameworkK/V store

framework H2O process

Network layer

REST layer

H2O - algos

H2O - core

User process

H2O process

Legend

H2O.aiMachine Intelligence

R Script Retrieving H2O GLM Result

HTTP

REST/JSON

h2o.getModel()GET /3/Models/glm_model_id

h2o.glm()

R script

Standard R process

TCP/IP

HTTP

REST/JSON

/3/Models endpoint

Fork/Join framework

K/V store framework

H2O process

Network layer

REST layer

H2O - algos

H2O - core

User process

H2O process

Legend