jaws - data warehouse with spark sql by ema orhian

Ema Orhian @emaorhian

Jaws - Data Warehouse with Spark SQL

• Big Data analytics / Machine Learning• 4+ years exp with Hadoop ecosystem• 2 years exp with Spark

About me

http://bigdataresearch.io/

• Co-founder of Big Data Research Group • Provides open source solutions around Big Data analytics

http://atigeo.com/

Agenda• jaws-spark-sql-rest (Jaws) intro• Main features • Architecture • Scaling• Resource manager• Working with Tachyon• Working with Parquet files• Configure Spark Sql context• Demo

Shared Spark Sql Context

Concurrent queries run

Query history

Page resultsQuery editor

Jaws• Highly scalable and resilient data warehouse explorer

• Restful alternative to Spark SQL JDBC and not only …

• Support for Spark 0.9.1/Shark thru Spark 1.5

• Support for hive/MR

https://github.com/atigeo/jaws-spark-sql-rest

Main features• Submit queries concurrently and asynchronously

• Provides persisted logs, query history, results with paging

• Pluggable persistent layer (Cassandra/HDFS)

• Supports load balancing with query cancelation

• Provides a metadata browser

• In-memory Parquet warehouse with Tachyon

• Configuration file to fine tune Spark context

• Pluggable UI

Jaws architecture

Scaling•Standalone mode

•Mesos

•YARN

Fine grained mode

Coarse grained mode

Canceling a query

Results persistence• Queries with limited number of results:

‣ Cassandra‣ HDFS

• Queries with unlimited number of results:‣ HDFS‣ Tachyon

Working with Tachyon• Persists unlimited results in Tachyon• Registers tables over Parquet files from Tachyon

Tachyon benefits:★ in memory storage system★ share data between applications at a memory

Working with Parquet files• Register tables on top of parquet files

Parquet★ columnar format★ nested data structures★ supports schema evolution★ efficient compression

• Files stored on HDFS or Tachyon• MetaInfo about table stored in Cassandra (feature before Spark

Configuring Jaws

• Cassandra

• HDFS

• Spray

• Application

• Spark

sparkConfiguration {spark-master=“spark://devbox.local:7077”

/ “mesos://devbox.local:5050” / yarn-client

spark-mesos-coarse=false / truespark-cores-max=100spark-executor-instances=10 }

jaws - data warehouse with spark sql by ema orhian

Data & Analytics

20160520-awskrug & jaws-ug meetup day ＃01 navigating jaws...

dr.shahzadi tayyaba hashmi dnt 243. major infections of jaws...

osteonecrosis of the jaws bisphosphonate...

jaws interview analysis

the first jaws

chuck jaws catalogue...blank jaws standard long pointed...

krakens jaws

clamping jaws catalogue · claw-type jaws, hardened, tongue...

jaws (film)

inserts dies jaws

jaws for f+ and fnc - forkardt | workholdingjaws f+ manual...

development of jaws

business jaws

learning jaws

jaws - nintendo nes - manual - gamesdbase -...

jaws film review

osteomyelitis of the jaws

the jaws effect

installation instructions, jaws & jawsii · typical...

jaws – bad company