hive on spark is blazing fast or is it final

Hive on Spark is Blazing Fast… Or Is It?

Carter Shanklin and Mostafa Mokhtar

Why SQL on Hadoop? Solving for Scale.

Hadoop is great forcost, but MapReduce istoo difficult.

SQL on Hadoop makesHadoop real and givesme scale that traditionalSQL can’t offer.

I’m deleting importantdata because it’s tooexpensive to store it.

SQL at Facebook: Emergence of Apache Hive

Developed Hive to address traditional RDBMS limitations.300+ PB of data under management(1).600+ TB of data loaded daily.60,000+ Hive queries per day(2).More than 1,000 users per day.Initial Apache release in April 2009.

Hive Classic: Strengths and Challenges

Familiar SQL Interface+

Economical Processing of Petabytes+

Hive Classic tied to MapReduce, leading to latency

Traditional SQL Workloads Needed Higher Performance!

Need for Speed: The Stinger Initiative

Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.

Launched: February 2013; Delivered: April 2014.

Delivered in 100% Apache Open Source.

SQL Engine

VectorizedSQL Engine

ColumnarStorage

ORCFile

= 100X+ +

Distributed Execution

Apache Tez

Stinger Phase 3: TPC-DS Benchmark at 30 Terabyte Scale

Sample of 50 queries from TPC-DS at 30 terabyte scale.

Average 52x Query Speedup, Maximum 160x Query Speedup.

Total benchmark time decreased from 7.8 days to 9.3 hours.(3)

Cost-Based Optimizer added in Hive 14 gave additional 2.5x Speedup.

Hive + Stinger at Yahoo

Around 1 million Hive jobs run every month.

ScaleiTotal benchmark time from 8.1 hours to 1.3 hours at 10TB scale.

PerformanceiUp to 82x faster.(4)

Performancei

Stinger at Spotify

Query 25 TB of compressed data in 10 Minutes across 690 nodes (MapReduce too slow to complete.)

Speedi16x less HDFS read when using ORCFile versus Avro.(5)

Efficiencyi

ORCFile at Facebook

Saved more than 1,400 servers worth of storage.

CompressioniCompression ratio increased from 5x to 8xglobally.

Compressioni

Hive on Tez: Conclusion

Hive on Tez delivers fast batch and interactive SQL today.

But users need more speed!

Proven at petabyte scale.

ScaleiThe most comprehensive open-source SQL on Hadoop.

SQLiMore than 90 Hortonworks customers use Hive-on-Tez today for fast SQL.

Speedi

Hortonworks Customer Support metrics as of Feb/2015

Next Stop: Stinger.next and Sub-Second SQL

Emergence of LLAP and Hive-on-Spark bring Sub-Second within reach.

What does it take to get Hive to sub-second?

Does Hive-on-Spark get us there?

Performance Today and the Sub-Second FutureHive on Tez, Hive on Spark, Hive on Mapreduce & Spark-SQL

Query processing in Hadoop

Block Cache

Linux Cache

Columnar Storage

Parquet File

SQL Engine

Hive Engine

SQL support

HiveQL

Columnar Storage

ORC File

MapReduce Spark

Spark-SQL

SQL Engine

Query processing in Hadoop

Block Cache

Linux Cache

Columnar Storage

Parquet File

SQL Engine

Hive Engine

SQL support

HiveQL

Columnar Storage

ORC File

MapReduce Spark

Spark-SQL

SQL Engine

What is covered today

in terms of performance

Performance comparison : Test bed

Component Version

Hive 1.2.0

Tez 0.5.2

Spark 1.2.0

Hadoop 2.6.0

Software :

Hardware

20 physical nodes, each with:

● 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores/machine

● Hyper-threading enabled

● 256GB RAM per node

● 6x 4TB WDC WD4000FYYZ-0 drives per node

● 10 Gigabit interconnect between the nodes

Note: Based on the YARN Node Manager’s Memory Resource setting used below, only 128 GB of RAM per node

was dedicated to query processing.

Execution

Engine

Primitives on 30TB

Scale factor

TPC-DS queries on

30TB Scale factor

TPC-DS queries on 200GB

Scale factor

Spark X X X

Tez X X X

Map Reduce X

Spark-SQL X X X

Performance benchmarks :

Performance comparison : Configurations

Hive on Tez

● 128GB of memory allocated

● 16 out of 32 Logical processors

allocated

● hive.execution.engine = tez

● hive.auto.convert.join.noconditionaltask.

size = 600MB

● Vectorization enabled

● CBO enabled

● Fetch column stats enabled

Other settings

● hive.prewarm.numcontainers = 317

● hive.tez.auto.reducer.parallelism = true

Hive on Spark

allocated

● hive.execution.engine=spark

● Configuration parameters followed

recomendation from Hive on Spark wiki

http://tinyurl.com/pk2ju8e which

also had CBO, Vectoriztion, fetch

column stats enabled etc..

● spark.master=yarn-master

Spark settings

● spark.shuffle.memoryFraction = 0.5

● spark.storage.memoryFraction = 0.1

● spark.shuffle.consolidateFiles = true

● spark.serializer = org.apache.spark.serializer.KryoSerializer

Spark-SQL

allocated

● spark.shuffle.memoryFraction = 0.5

● spark.storage.memoryFraction = 0.1

● spark.shuffle.consolidateFiles = true

● spark.serializer = org.apache.spark.serializer.KryoSerializer

● spark.sql.shuffle.partitions = 1009

● spark-sql --master yarn-client

● driver-memory 8g

● Default GC configuration

spark.sql.codegen was not enabled as it caused

most queries to fail.

Performance comparison : TPC-DS 200GB

● Warm timings reported, Cold queries on Spark are significantly slower

● Hive on Tez using ORC format

● Hive on Spark using Parquet format

● Spark-sql using Parquet format

Performance comparison : TPC-DS 200GB continued..

● Warm timings reported, Cold queries on Spark are significantly slower

● Hive on Tez using ORC format

● Hive on Spark using Parquet format

● Spark-sql using Parquet format

Hive on Tez is

77% faster than Hive on Spark

10% faster than Spark-sql

Spark-sql is

60% faster than Hive on Spark

Performance comparison : TPC-DS 200GB summary

Even simple

queries don’t

run in sub-

second

Even simple

queries don’t

run in sub-

second

Performance comparison : TPC-DS 200GB

● 200GB Scale factor, un-partitioned schema

● 45x unmodified queries from TPC-DS

● ORC format compression ratio 3.4x

● Parquet format compression ratio of 2.8x

Performance comparison : TPC-DS 30TB

● 30 TB Scale factor

● ORC Table format

● Fact tables partitioned on *_date_sk

● Explicit partition filters where used for Hive on Spark and Spark-SQL (but not for Hive-on-Tez)

● 20 out of the previously used queries where used, warm query timings reported

● Hive on Tez outperforms Hive on Spark and Spark-SQL by up to 18x

● Hive on Spark completed 15 out of the 20, the remaining 5 queries errored out or where stuck in GC and got cancelled

● Spark-SQL completed 7 out of the 20, the remaining 13 queries either failed within a couple of minutes or errored out after running

for hours

● Spark-SQL performance is negatively affected by in-efficient query plans as it lacks a query optimizer

Workload config

Highlights from 30TB TPC-DS test

10,098

10,098For large data set

Hive on Tez is ~5x

faster than Hive on

Performance comparison : TPC-DS 30TB continued

Failed Spark-SQL

queries

Performance comparison : TPC-DS 30TB Q17

Hive on Tez

query ends

Why didn’t Spark take Hive to sub-second?

● Hive is CPU bound for most operations specially after the introduction of columnar file formats (do more with less)

● Spark consumes more CPU, Disk & Network IO than Tez

● Hive on Spark spends a lot of time translating from RDDs to Hive’s “Row Containers”

Why didn’t Spark take Hive to sub-second?

● Hive is CPU bound for most operations specially after the introduction of columnar file formats (do more with less)

● Spark consumes more CPU, Disk & Network IO than Tez for relatively large datasets

● Hive on Spark spends a lot of time translating from RDDs to Hive’s “Row Containers”

2x less

Disk IO

4x less

Network IO6x less

I don’t believe what you just said!!!

Show me some queries I can understand...

Simple queries to understand complex systems

Execution engine Primitives

Performance comparison : What are those primitives?

Group Test case Comment

Create table as select * Insert 8 Billion rows, 570 GB of Data

Create table as select with Group by Group by and Insert 8 Billion rows, 570 GB of Data

Create table as with Group by on all columns followed

by cluster by

Group by, cluster by and Insert 8 Billion rows, 570 GB of Data

Group byGroup by on primary key Group by 25 billion distinct keys

Group by on column with low NDV* Group by 82 billion rows with 8K distinct keys

Map join

store_sales x item Map join 28 Billion x 462K

store_sales x item x store Map join 28 Billion x 462K x 1.7K

store_sales x item x store x customer_demographics Map join 28 Billion x 462K x 1.7K x 1.9 Million

Shuffle Join

Shuffle join Shuffle join 8.6 Billion x 706 Million rows

Shuffle join + Group by on primary key Shuffle join 8.6 Billion x 706 Million rows followed by group by on

675 Million rows

NDV* Number of distinct values

Performance comparison : CTAS

Create table test_table as select * from store_returns;

Execution engine Elapsed time (Seconds) Tez Gain %

Hive on Tez 316

Hive on Spark 351 11%

Hive on Mapreduce 494 56%

Spark-SQL 418 32%

Table Scan

store_returns

8 Billion rows

Table Insert

8 Billion rows

316351

Performance comparison : CTAS

Create table test_table as select * from store_returns;

Hive on Tez 316

Spark-SQL 418 32%

Table Scan

store_returns

8 Billion rows

Table Insert

8 Billion rows

316351

Tez is

11% faster than Spark

56% faster than Mapreduce

32% faster than Spark-SQL

Performance comparison : CTAS with group by

Create table test_table as select * from store_returns group by *;

Hive on Tez 630

Hive on Spark 1,608 155%

Spark-SQL 1,202 91%

Table Insert

4 Billion rows

Shuffle

On all columns

8 Billion rows

Group by

On all columns

7 billion rows

Table Scan

store_returns

8 Billion rows

Hive on Tez 630

Spark-SQL 1,202 91%

Table Insert

4 Billion rows

Shuffle

On all columns

8 Billion rows

Group by

On all columns

7 billion rows

Table Scan

store_returns

8 Billion rows

This time, execution engine

must prepare, shuffle and

aggregate data.

Hive on Tez 630

Spark-SQL 1,202 91%

Table Insert

4 Billion rows

Shuffle

On all columns

8 Billion rows

Group by

On all columns

7 billion rows

Table Scan

store_returns

8 Billion rows

Tez is

Performance comparison : Select + group by on PK

select count(*) rowcount from store_sales group by ss_item_sk , ss_ticket_number having rowcount > 100000000

Hive on Tez 457

Spark-SQL 862 89%

Select

0 rows qualify

Shuffle

25 Billion rows

Group by

25 billion rows

Table Scan

25 Billion rows

Filter operator

25 billion rows

893 862

Hive on Tez 457

Spark-SQL 862 89%

Select

0 rows qualify

Shuffle

25 Billion rows

Group by

25 billion rows

Table Scan

25 Billion rows

Filter operator

25 billion rows

893 862

Group-By performed on all

25 billion distinct keys.

Hive on Tez 457

Spark-SQL 862 89%

Select

0 rows qualify

Shuffle

25 Billion rows

Group by

25 billion rows

Table Scan

25 Billion rows

Filter operator

25 billion rows

893 862

Tez is

Performance comparison : Select + group by on low NDV

select sum(ss_list_price) from store_sales group by ss_sold_date_sk having sum(ss_list_price) = 1

Hive on Tez 51

Spark-SQL 164 221%

Select

0 rows qualify

Group by

85 billion rows

Table Scan

85 Billion rows

Filter operator

8K rows

Performance comparison : Select + group by on low NDV

select sum(ss_list_price) from store_sales group by ss_sold_date_sk having sum(ss_list_price) = 1

Hive on Tez 51

Spark-SQL 164 221%

Select

0 rows qualify

Group by

85 billion rows

Table Scan

85 Billion rows

Filter operator

8K rows

Hive on Tez and

Hive on Spark

outperform

Spark-SQL

select count(*) from store_sales, item, store, customer_demographics where i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_cdemo_sk = cd_demo_sk

Performance comparison : Map join with 1,2 & 3 tables

Map join

27 Billion

Map join

27 Billion rows

Map join

27 Billion rows

Table Scan

store_sales

28 Billion rows

Table Scan

customer_demographic

1.9 Million rows

Table Scan

472K rows

Table Scan

1.7K rows

Execution engine Map join #1 Map join #2 Map join #3 Tez Join #1 Gain % Tez Join #2 Gain % Tez join #3 Gain %

Hive on Tez 108 145 232

Hive on Spark 106 142 289 98% 98% 125%

Hive on Mapreduce 247 280 800 228% 193% 345%

Spark-SQL 86 117 166 -20% -20% -28%

select count(*) from store_sales, item, store, customer_demographics where i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_cdemo_sk = cd_demo_sk

Performance comparison : Map join with 1,2 & 3 tables

Map join

27 Billion

Map join

27 Billion rows

Map join

27 Billion rows

Table Scan

store_sales

28 Billion rows

Table Scan

customer_demographic

1.9 Million rows

Table Scan

472K rows

Table Scan

1.7K rows

Execution engine Map join #1 Map join #2 Map join #3 Tez Join #1 Gain % Tez Join #2 Gain % Tez join #3 Gain %

Hive on Tez 108 145 232

Hive on Spark 106 142 289 98% 98% 125%

Hive on Mapreduce 247 280 800 228% 193% 345%

Spark-SQL 86 117 166 -20% -20% -28%

Spark-SQL is faster than

Hive on Tez and Hive on

Spark for Map-joins

Performance comparison : Shuffle join + group by

● select count(*) from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number

● select count(*) from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number group by

ss_item_sk , ss_ticket_number having rowcount > 1

Execution engine Shuffle join Shuffle join + group by Tez Shuffle Gain % Tez Gain %

Hive on Tez 400 453

Hive on Spark 1,078 1,120 170% 147%

Hive on Mapreduce 756 826 89% 82%

Spark-SQL 1,835 1,884 359% 316%

Shuffle Join

9 Billion rows

Group by

675 Million

Billion

6 Million

Select

0 rows

Filter

675 Million

1,078 1,120

1,8841,835

Shuffle Join

9 Billion rows

Group by

675 Million

Billion

6 Million

Select

0 rows

Filter

675 Million

1,078 1,120

1,8841,835

Tez is

Hive on Tez 400 453

Hive on Spark 1,078 1,120 170% 147%

Spark-SQL 1,835 1,884 359% 316%

Shuffle Join

9 Billion rows

Group by

675 Million

Billion

6 Million

Select

0 rows

Filter

675 Million

1,078 1,120

1,8841,835

Why are shuffles so

slow for Hive on Spark

and Spark-SQL

Hive on Tez 400 453

Hive on Spark 1,078 1,120 170% 147%

Spark-SQL 1,835 1,884 359% 316%

Performance comparison : Shuffle join cluster CPU utilization

Hive on Tez

query ends

Hive on

Spark query

ends here

Performance comparison : Primitive results summary

Performance comparison : Performance summary

Short running query+

Large joins and aggregates+

Slower than Spark-SQL in Map joins

High GC

Instability

SQL support limited compared to Hive

Lack of sophisticated query optimizer

Efficient resource utilization+

Map join performance+

Large Joins

Outperforms Spark-SQL in large join+

Slower than Tez for large joins and aggregates

High GC

Hive Tez

Spark-SQL

Hive on Spark

MapReduce

Promising initial release+

Solving Hive’s Top Performance Challenges

Apache Hive: Modern ArchitectureSt

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

SQL Engines

Row Engine Vector Engine

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2

Block Cache

Linux Cache

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Vector Cache

Persistent Server

Historical

Current

In Development

Legend

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

SQL Engines

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2

Apache Hive: Getting to Sub-Second Improvement

LLAP: Persistent servers cache vectors and start

queries instantly. Pluggable integrations

with Tez or Spark.

Block Cache

Linux Cache

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Historical

Current

In Development

Legend

Vector Cache

Persistent Server

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

SQL Engines

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2 Vectorized Hash Join Solves CPU

Boundedness for Hive on Tez or on

Spark.

Block Cache

Linux Cache

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Historical

Current

In Development

Legend

Vector Cache

Persistent Server

Columnar Storage

ORCFile Parquet

Unstructured Data

JSON CSV

Text Avro

Custom

Weblog

SQL Engines

SQL Support

SQL:2011 Optimizer HCatalog HiveServer2 Improved metadata catalog allows instant query planning and optimization for any

engine.

Block Cache

Linux Cache

Hadoop 1

MapReduce

Hadoop 2

Tez Spark

Historical

Current

In Development

Legend

Vector Cache

Persistent Server

Apache Hive’s Sub-Second Future

=Sub-Second

Metadata

Fast, Scalable

Metadata Catalog

Persistent Server

SQL Engine

VectorizedHash Join

Choice of Execution Engines

Tez orSpark

Questions?

?Interested? Stop by the Hortonworks booth to learn more

Endnotes

(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/

(2) https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920

(3) http://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/

(4) http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn

(5) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014

hive on spark is blazing fast or is it final

hive queries

hive jobs

rights reserved1why

hortonworks customers

rights reserved4need

rights reserved7stinger

rights reserved2sql

rights reserved9hive

Software

fine-grained security for spark and hive

cse 6242 / cx 4242 data and visual analytics | georgia tech...

shark - hive on spark

evaluating hive and spark sql with bigbench · 4 spark sql...

the intelligent catalog and data sequestering engine...

characterizing bigbench queries, hive, and spark in multi...

hive now sparks - events.static.linuxfound.org · apache...

about intellipaat€¦ · • introduction to hive •...

hive hive hive hive

integrating apache hive with kafka, spark, and...

big data: hive on spark mit jedox nutzen...it-novum esearch...

spectrum geocoding for big data v3.2.0 user guide · hive...

big data basics hadoop, mapreduce, hive, pig, & spark

hadoop, hive, spark and object stores

query engines for hive: mr, spark, tez with llap –...

detecting ddos attacks using distributed processing...

uniﬁed data access with spark...

blazing in the spark of god: thomas merton's references...

dancing elephants: working with object storage in apache...

r data access from hdfs,spark,hive