hands on : pysparkling water · pysparkling water = python + spark + h2o python +sparkling water....

10
Hands On : PySparkling Water - By Nidhi Mehta

Upload: others

Post on 20-May-2020

33 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Hands On : PySparkling Water

- By Nidhi Mehta

Page 2: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

What is PySparkling Water

PySparkling Water = Python + Spark + H2O

Sparkling Water Python +

Page 3: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Py4J

H2O Context

Spark Context

H2O Python

h2o.init ( ip, port )

Driver Python

Cluster Manager

Executor

H2O

Executor

H2O

H2O Rest API

Master Workers

PySparkling Architecture

Page 4: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Aim: Build a model to predict Arrest for Chicago crime dataset

● Import Chicago Crime Dataset● Combine Crime data with Census and Weather

data.● Build a model to predict whether an arrest was

made● Predict on a test dataset

Demo Workflow

Page 5: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

- Install Spark-1.5.1

- Install and Build Sparkling Water-1.5.6

( ./gradlew build -x check )

- Install H2O-3.6.0.3

- Install H2O-python

( sudo pip install h2o-3.6.0.3-py2.py3-none-any.whl )

Pre Requisites to run the demo

Page 6: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

1)

Set spark environment by specifying SPARK_HOME and Master

export SPARK_HOME =Path_to_Spark_dir

export MASTER ='local-cluster[2,8,6040]'

2)

- To run from Python notebook-

IPYTHON_OPTS="notebook" Path_to_Sparkling_dir/bin/pysparkling

- To run from regular Python shell

Path_to_Sparkling_dir/bin/pysparkling

Command to Start/Access PySparking Water Cluster

Page 7: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Let's Run the Demo!

Page 8: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Why use PySparkling

● Automatic Parallelization and less lines of code

● Much Faster on big data - uses H2O's rest API calls to connect to H2O Cluster

Page 9: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

Thank You

Page 10: Hands On : PySparkling Water · PySparkling Water = Python + Spark + H2O Python +Sparkling Water. Py4J H2O Context Spark Context H2O Python h2o.init ( ip, port ) Driver Python Cluster

What do these stickers mean?

I have Sparkling Water Installed

I have Python installed

I have H2O installed

I have the H2O World data sets

Pick up stickers or get install help at the information booth