python and h2o with cliff click at pydata dallas 2015

Post on 21-Jul-2015

1.318 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

H2O.aiMachine Intelligence

Fast, Scalable In-Memory Machine and Deep LearningFor Smarter Applications

Python with H2O

Cliff Click

H2O.aiMachine Intelligence

Who Am I?

Cliff ClickCTO, Co-Founder H2O.aicliff@h2o.ai

40 yrs coding35 yrs building compiler30 yrs distributed computation20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware,

NonBlockingHashMap20 patents, dozens of papers100s of public talks

PhD Computer Science1995 Rice UniversityHotSpot JVM Server Compiler“showed the world JITing is possible”

H2O.aiMachine Intelligence

H2O Open Source In-MemoryMachine Learning for Big Data

Distributed In-Memory Math PlatformGLM, GBM, RF, K-Means, PCA, Deep Learning

Easy to use SDK & APIJava, R/CRAN, Scala, Spark, Python, JSON, Browser GUI

Use ALL your dataModeling without samplingHDFS, S3, NFS, NoSql

Big Data & Better AlgorithmsBetter Predictions!

H2O.aiMachine Intelligence

TBD. Customer Support

TBDHead of Sales

Distributed

Systems

Engineers

Making

ML Scale!

H2O.aiMachine Intelligence

Practical Machine Learning

Value RequirementsFast & Interactive In-Memory

Big Data (No Sampling) Distributed

Ownership Open Source

Extensibility API/SDK

Portability Java, REST/JSON

Infrastructure Cloud or On-Premise Hadoop or Private Cluster

H2O.aiMachine Intelligence

H2O Architecture

Prediction Engine

R & Exec Engine Web Interface

Spark Scala REPL

Nano-FastScoring Engine

Distributed In-Memory K/V Store

Column Compress DataMap/Reduce

Memory Manager

Algorithms! GBM, Random Forest, GLM, PCA, K-Means,

Deep Learning

HDFS S3 NFS

H2O.aiMachine Intelligence

H2O Architecture

Prediction Engine

R & Exec Engine Web Interface

Spark Scala REPL

Nano-FastScoring Engine

Distributed In-Memory K/V Store

Column Compress DataMap/Reduce

Memory Manager

Algorithms! GBM, Random Forest, GLM, PCA, K-Means,

Deep Learning

HDFS S3 NFS

H2O.aiMachine Intelligence

Demo!

Python Demo

● CitiBike of NYC● Predict bikes-per-hour-per-station

– From per-trip logs● 10M rows of data● Group-By, date/time feature-munging

H2O.aiMachine Intelligence

H2O: A Platform for Big Math

● Most Any Java on Big 2-D Tables– Write like its single-thread POJO code– Runs distributed & parallel by default

● Fast: billion row logistic regression takes 4 sec● Worlds first parallel & distributed GBM

– Plus GBM, Deep Learn / Neural Nets, RF, PCA, GLM...

● R integration: use terabyte datasets from R● Sparkling Water: Direct Spark integration

H2O.aiMachine Intelligence

H2O: A Platform for Big Math

● Easy launch: “java -jar h2o.jar”

– No GC tuning: -Xmx as big as you like

● Production ready:– Private on-premise cluster OR– In the Cloud– Hadoop, Yarn, EC2, or standalone cluster – HDFS, S3, NFS, URI & other datasources– Open Source, Apache v2

top related