![Page 1: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/1.jpg)
H2O.aiMachine Intelligence
Fast, Scalable In-Memory Machine and Deep LearningFor Smarter Applications
Python with H2O
Cliff Click
![Page 2: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/2.jpg)
H2O.aiMachine Intelligence
Who Am I?
Cliff ClickCTO, Co-Founder [email protected]
40 yrs coding35 yrs building compiler30 yrs distributed computation20 yrs OS, device drivers, HPC, HotSpot 10 yrs Low-latency GC, custom java hardware,
NonBlockingHashMap20 patents, dozens of papers100s of public talks
PhD Computer Science1995 Rice UniversityHotSpot JVM Server Compiler“showed the world JITing is possible”
![Page 3: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/3.jpg)
H2O.aiMachine Intelligence
H2O Open Source In-MemoryMachine Learning for Big Data
Distributed In-Memory Math PlatformGLM, GBM, RF, K-Means, PCA, Deep Learning
Easy to use SDK & APIJava, R/CRAN, Scala, Spark, Python, JSON, Browser GUI
Use ALL your dataModeling without samplingHDFS, S3, NFS, NoSql
Big Data & Better AlgorithmsBetter Predictions!
![Page 4: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/4.jpg)
H2O.aiMachine Intelligence
TBD. Customer Support
TBDHead of Sales
Distributed
Systems
Engineers
Making
ML Scale!
![Page 5: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/5.jpg)
H2O.aiMachine Intelligence
Practical Machine Learning
Value RequirementsFast & Interactive In-Memory
Big Data (No Sampling) Distributed
Ownership Open Source
Extensibility API/SDK
Portability Java, REST/JSON
Infrastructure Cloud or On-Premise Hadoop or Private Cluster
![Page 6: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/6.jpg)
H2O.aiMachine Intelligence
H2O Architecture
Prediction Engine
R & Exec Engine Web Interface
Spark Scala REPL
Nano-FastScoring Engine
Distributed In-Memory K/V Store
Column Compress DataMap/Reduce
Memory Manager
Algorithms! GBM, Random Forest, GLM, PCA, K-Means,
Deep Learning
HDFS S3 NFS
![Page 7: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/7.jpg)
H2O.aiMachine Intelligence
H2O Architecture
Prediction Engine
R & Exec Engine Web Interface
Spark Scala REPL
Nano-FastScoring Engine
Distributed In-Memory K/V Store
Column Compress DataMap/Reduce
Memory Manager
Algorithms! GBM, Random Forest, GLM, PCA, K-Means,
Deep Learning
HDFS S3 NFS
![Page 8: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/8.jpg)
H2O.aiMachine Intelligence
Demo!
Python Demo
● CitiBike of NYC● Predict bikes-per-hour-per-station
– From per-trip logs● 10M rows of data● Group-By, date/time feature-munging
![Page 9: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/9.jpg)
H2O.aiMachine Intelligence
H2O: A Platform for Big Math
● Most Any Java on Big 2-D Tables– Write like its single-thread POJO code– Runs distributed & parallel by default
● Fast: billion row logistic regression takes 4 sec● Worlds first parallel & distributed GBM
– Plus GBM, Deep Learn / Neural Nets, RF, PCA, GLM...
● R integration: use terabyte datasets from R● Sparkling Water: Direct Spark integration
![Page 10: Python and H2O with Cliff Click at PyData Dallas 2015](https://reader035.vdocuments.us/reader035/viewer/2022080214/55ad55661a28ab726b8b47f5/html5/thumbnails/10.jpg)
H2O.aiMachine Intelligence
H2O: A Platform for Big Math
● Easy launch: “java -jar h2o.jar”
– No GC tuning: -Xmx as big as you like
● Production ready:– Private on-premise cluster OR– In the Cloud– Hadoop, Yarn, EC2, or standalone cluster – HDFS, S3, NFS, URI & other datasources– Open Source, Apache v2