jaws - data warehouse with spark sql by ema orhian
TRANSCRIPT
![Page 1: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/1.jpg)
Ema Orhian @emaorhian
Jaws - Data Warehouse with Spark SQL
![Page 2: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/2.jpg)
• Big Data analytics / Machine Learning• 4+ years exp with Hadoop ecosystem• 2 years exp with Spark
About me
http://bigdataresearch.io/
• Co-founder of Big Data Research Group • Provides open source solutions around Big Data analytics
http://atigeo.com/
![Page 3: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/3.jpg)
Agenda• jaws-spark-sql-rest (Jaws) intro• Main features • Architecture • Scaling• Resource manager• Working with Tachyon• Working with Parquet files• Configure Spark Sql context• Demo
![Page 4: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/4.jpg)
Shared Spark Sql Context
Concurrent queries run
Query history
Page resultsQuery editor
![Page 5: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/5.jpg)
Jaws• Highly scalable and resilient data warehouse explorer
• Restful alternative to Spark SQL JDBC and not only …
• Support for Spark 0.9.1/Shark thru Spark 1.5
• Support for hive/MR
https://github.com/atigeo/jaws-spark-sql-rest
![Page 6: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/6.jpg)
Main features• Submit queries concurrently and asynchronously
• Provides persisted logs, query history, results with paging
• Pluggable persistent layer (Cassandra/HDFS)
• Supports load balancing with query cancelation
• Provides a metadata browser
• In-memory Parquet warehouse with Tachyon
• Configuration file to fine tune Spark context
• Pluggable UI
![Page 7: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/7.jpg)
Jaws architecture
![Page 8: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/8.jpg)
Scaling•Standalone mode
•Mesos
•YARN
Fine grained mode
Coarse grained mode
![Page 9: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/9.jpg)
Canceling a query
![Page 10: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/10.jpg)
Canceling a query
![Page 11: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/11.jpg)
Results persistence• Queries with limited number of results:
‣ Cassandra‣ HDFS
• Queries with unlimited number of results:‣ HDFS‣ Tachyon
![Page 12: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/12.jpg)
Working with Tachyon• Persists unlimited results in Tachyon• Registers tables over Parquet files from Tachyon
Tachyon benefits:★ in memory storage system★ share data between applications at a memory
speed
![Page 13: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/13.jpg)
Working with Parquet files• Register tables on top of parquet files
Parquet★ columnar format★ nested data structures★ supports schema evolution★ efficient compression
• Files stored on HDFS or Tachyon• MetaInfo about table stored in Cassandra (feature before Spark
1.3)
![Page 14: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/14.jpg)
Configuring Jaws
• Cassandra
• HDFS
• Spray
• Application
• Spark
sparkConfiguration {spark-master=“spark://devbox.local:7077”
/ “mesos://devbox.local:5050” / yarn-client
spark-mesos-coarse=false / truespark-cores-max=100spark-executor-instances=10 }
![Page 15: Jaws - Data Warehouse with Spark SQL by Ema Orhian](https://reader036.vdocuments.us/reader036/viewer/2022070516/586f74f01a28ab10258b5ddf/html5/thumbnails/15.jpg)
Demo