python and h2o with cliff click at pydata dallas 2015

10
H2O.ai Machine Intelligence Fast, Scalable In-Memory Machine and Deep Learning For Smarter Applications Python with H2O Cliff Click

Upload: sri-ambati

Post on 21-Jul-2015

1.318 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

Fast, Scalable In-Memory Machine and Deep LearningFor Smarter Applications

Python with H2O

Cliff Click

Page 2: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

Who Am I?

Cliff ClickCTO, Co-Founder [email protected]

40 yrs coding35 yrs building compiler30 yrs distributed computation20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware,

NonBlockingHashMap20 patents, dozens of papers100s of public talks

PhD Computer Science1995 Rice UniversityHotSpot JVM Server Compiler“showed the world JITing is possible”

Page 3: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

H2O Open Source In-MemoryMachine Learning for Big Data

Distributed In-Memory Math PlatformGLM, GBM, RF, K-Means, PCA, Deep Learning

Easy to use SDK & APIJava, R/CRAN, Scala, Spark, Python, JSON, Browser GUI

Use ALL your dataModeling without samplingHDFS, S3, NFS, NoSql

Big Data & Better AlgorithmsBetter Predictions!

Page 4: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

TBD. Customer Support

TBDHead of Sales

Distributed

Systems

Engineers

Making

ML Scale!

Page 5: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

Practical Machine Learning

Value RequirementsFast & Interactive In-Memory

Big Data (No Sampling) Distributed

Ownership Open Source

Extensibility API/SDK

Portability Java, REST/JSON

Infrastructure Cloud or On-Premise Hadoop or Private Cluster

Page 6: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

H2O Architecture

Prediction Engine

R & Exec Engine Web Interface

Spark Scala REPL

Nano-FastScoring Engine

Distributed In-Memory K/V Store

Column Compress DataMap/Reduce

Memory Manager

Algorithms! GBM, Random Forest, GLM, PCA, K-Means,

Deep Learning

HDFS S3 NFS

Page 7: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

H2O Architecture

Prediction Engine

R & Exec Engine Web Interface

Spark Scala REPL

Nano-FastScoring Engine

Distributed In-Memory K/V Store

Column Compress DataMap/Reduce

Memory Manager

Algorithms! GBM, Random Forest, GLM, PCA, K-Means,

Deep Learning

HDFS S3 NFS

Page 8: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

Demo!

Python Demo

● CitiBike of NYC● Predict bikes-per-hour-per-station

– From per-trip logs● 10M rows of data● Group-By, date/time feature-munging

Page 9: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

H2O: A Platform for Big Math

● Most Any Java on Big 2-D Tables– Write like its single-thread POJO code– Runs distributed & parallel by default

● Fast: billion row logistic regression takes 4 sec● Worlds first parallel & distributed GBM

– Plus GBM, Deep Learn / Neural Nets, RF, PCA, GLM...

● R integration: use terabyte datasets from R● Sparkling Water: Direct Spark integration

Page 10: Python and H2O with Cliff Click at PyData Dallas 2015

H2O.aiMachine Intelligence

H2O: A Platform for Big Math

● Easy launch: “java -jar h2o.jar”

– No GC tuning: -Xmx as big as you like

● Production ready:– Private on-premise cluster OR– In the Cloud– Hadoop, Yarn, EC2, or standalone cluster – HDFS, S3, NFS, URI & other datasources– Open Source, Apache v2