overview of stinger interactive query for hive
DESCRIPTION
Presentation given to the OC Big Data Meetup Group. http://www.meetup.com/OCBigDataTRANSCRIPT
Overview of S+nger: Interac+ve Query for Hive
@ddkaiser linkedin.com/in/dkaiser slideshare.net/ddkaiser [email protected] [email protected]
OC Big Data Meetup #1 May 21, 2014
David Kaiser
Who Am I?
David Kaiser
20+ years experience with Linux 3 years experience with Hadoop Career experiences: • Data Warehousing • Geospa+al Analy+cs • Open-‐source Solu+ons and Architecture
Employed at Hortonworks as a Senior Solu+ons Engineer
@ddkaiser linkedin.com/in/dkaiser slideshare.net/ddkaiser [email protected] [email protected]
Overview of Stinger: Interactive Query for Hive
• Abstract: – Hadoop is about so much more than batch processing. With the recent release of Hadoop 2, there have been many new approaches for increased applica+on performance.
– Hive is the most used SQL implementa+on on Hadoop. – Hive provides the most amount of SQL compa+bility on Hadoop.
– But… Hive is Slow. – Hive WAS Slow.
– This talk will discuss the S+nger ini+a+ve, which improved Hive performance over 100x.
S"nger Project (announced February 2013)
Batch AND Interactive SQL-in-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE
Hive 0.13, April 2014
• Hive on Apache Tez • Query Service • Buffer Cache • Cost Based Op+mizer (Op+q) • Vectorized Processing
Hive 0.11, May 2013: • Base Op+miza+ons • SQL Analy+c Func+ons • ORCFile, Modern File Format
Hive 0.12, October 2013:
• VARCHAR, DATE Types • ORCFile predicate pushdown • Advanced Op+miza+ons • Performance Boosts via YARN
Speed Improve Hive query performance by 100X to allow for interac+ve query +mes (seconds)
Scale The only SQL processing in Hadoop designed for queries that scale from TB to PB
SQL Support broadest range of SQL seman+cs for analy+c applica+ons running against Hadoop
Goals:
An Open Community at its finest: Apache Hive Contribution
1,672 Jira Tickets Closed
145 Developers
44 Companies
~400,000 Lines Of Code Added…
13 Months
Outcomes from the Stinger Project
Page 5
Feature Descrip"on Benefit
Tez Integra+on Tez is significantly beeer engine than MapReduce Latency
Vectorized Query Take advantage of modern hardware by processing thousand-‐row blocks rather than row-‐at-‐a-‐+me. Throughput
Query Planner
Using extensive sta+s+cs now available in Metastore to beeer plan and op+mize query, including predicate pushdown during compila+on to eliminate por+ons of input (beyond par++on pruning)
Latency
ORC File Columnar, type aware format with indices Latency
Cost Based Op+mizer (Op+q)
Join re-‐ordering and other op+miza+ons based on column sta+s+cs including histograms etc. Latency
Hive as a Service Leaves engine running between sessions Latency
Buffer Cache Leaves most used HDFS file blocks in memory Latency
Hadoop 2: Moving Past MapReduce
Page 6
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
HDFS2 (redundant, highly-‐available & reliable storage)
YARN (cluster resource management)
MapReduce (data processing)
Others
HADOOP 2.0
Single Use System Batch Apps
Mul/ Purpose Pla5orm Batch, Interac/ve, Online, Streaming, …
Apache Tez as the new Primitive
HDFS2 (redundant, reliable storage)
Tez (execu+on engine)
YARN (cluster resource management)
HADOOP 2.0
MapReduce as Base Apache Tez as Base
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
Pig (data flow)
Hive (sql)
Others (Cascading)
HADOOP 1.0 Data Flow
Pig SQL Hive
Others (Cascading)
Batch MapReduce
Slider (con+nuous execu+on)
Online Data
Processing HBase, Accumulo
Real Time Stream
Processing Storm
Complete Open Source Stack • YARN is the logical extension of Apache Hadoop
– Complements HDFS, the data reservoir
• Resource Management for the Enterprise Data Lake – Shared, secure, mul+-‐tenant Hadoop
Allows for all processing in Open-Source Hadoop
Page 8
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-‐MEMORY (Spark)
HPC MPI (OpenMPI)
ONLINE (HBase)
OTHER (Search) (Weave…)
Feature Descrip"on Benefit
Tez Session Overcomes Map-‐Reduce job-‐launch latency by pre-‐launching Tez AppMaster Latency
Tez Container Pre-‐Launch
Overcomes Map-‐Reduce latency by pre-‐launching hot containers ready to serve queries. Latency
Tez Container Re-‐Use
Finished maps and reduces pick up more work rather than exi+ng. Reduces latency and eliminates difficult split-‐size tuning. Out of box performance!
Latency
Run+me re-‐configura+on of DAG
Run+me query tuning by picking aggrega+on parallelism using online query sta+s+cs Throughput
Tez In-‐Memory Cache Hot data kept in RAM for fast access. Latency
Complex DAGs Tez Broadcast Edge and Map-‐Reduce-‐Reduce paeern improve query scale and throughput. Throughput
Hive On Tez - Execution
ORC File Advantages Sustained Query Times Apache Hive 0.12 provides sustained acceptable query times even at petabyte scale
131 GB (78% Smaller)
File Size Comparison Across Encoding Methods Dataset: TPC-‐DS Scale 500 Dataset
221 GB (62% Smaller)
Encoded with Text
Encoded with RCFile
Encoded with ORCFile
Encoded with Parquet
505 GB (14% Smaller)
585 GB (Original Size) • Larger Block Sizes
• Columnar format arranges columns adjacent within the file for compression & fast access
Impala
Hive 12
Smaller Footprint Better encoding with ORC in Apache Hive 0.12 reduces resource requirements for your cluster.
ORCFile File Format
Page 11
Query-‐Op"mized: Split-‐able, columnar storage file Efficient Reads: Break into large “stripes” of data for efficient read Fast Filtering: Built in index, min/max, metadata for fast filtering blocks -‐ bloom filters if desired Efficient Compression: Decompose complex row types into primi+ves: massive compression and efficient comparisons for filtering Precomputa"on: Built in aggregates per block (min, max, count, sum, etc.)
A Journey to SQL Compliance Evolu"on of SQL Compliance in Hive SQL Datatypes SQL Seman"cs INT/TINYINT/SMALLINT/BIGINT SELECT, INSERT
FLOAT/DOUBLE GROUP BY, ORDER BY, HAVING
BOOLEAN JOIN on explicit join key
ARRAY, MAP, STRUCT, UNION Inner, outer, cross and semi joins
STRING Sub-‐queries in the FROM clause
BINARY ROLLUP and CUBE
TIMESTAMP UNION
DECIMAL Standard aggrega+ons (sum, avg, etc.)
DATE Custom Java UDFs
VARCHAR Windowing func+ons (OVER, RANK, etc.)
CHAR Advanced UDFs (ngram, XPath, URL)
Interval Types Sub-‐queries for IN/NOT IN, HAVING
JOINs in WHERE Clause
INSERT/UPDATE/DELETE
Legend Hive 10 or earlier
Roadmap
Hive 11
Hive 12
Hive 13
Tez – Execution Performance • Performance gains over Map Reduce
– Eliminate replicated write barrier between successive computa+ons. – Eliminate job launch overhead of workflow jobs. – Eliminate extra stage of map reads in every workflow job. – Eliminate queue and resource conten+on suffered by workflow jobs that are started aper a predecessor job completes.
Page 13
Pig/Hive -‐ MR Pig/Hive -‐ Tez
Hive – MR Hive – Tez
Hive-on-MR vs. Hive-on-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b on (a.id = b.id) JOIN c on (a.itemId = c.itemId) GROUP by a.state
SELECT a.state
JOIN (a, c) SELECT c.price
SELECT b.id
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
SELECT a.state, c.itemId
JOIN (a, c)
JOIN(a, b) GROUP BY a.state
COUNT(*) AVERAGE(c.price)
SELECT b.id
Tez avoids unneeded writes to HDFS
Vectorization • Rewrite all operations to operate on blocks of 1K+ records, rather than one record at a time
• Block is array of Java scalars, not Objects (eliminate Objects – compounding GC gains over time)
• Avoids many function calls, CPU pipeline stalls
• Size to fit in L1 cache, avoid cache misses
Page 15
Stinger Phase 3: Unlocking Interactive Query
S"nger Phase 3: Features and Benefits
Container Pre-‐Launch Overcomes Java VM startup latency by pre-‐launching hot containers ready to serve queries
Container Re-‐Use Finished Maps and Reduces pick up more work rather than exi+ng. Reduces latency and eliminates difficult split size tuning
Tez Integra+on Tez Broadcast Edge and Map-‐Reduce-‐Reduce paeern improve query scale and throughput
In-‐Memory Cache Hot data kept in RAM for fast access
Quantifying Stinger
Page 17
Hive 10 Hive 0.13 (Phase 3) Hive 0.11 (Phase 1)
190x Improvement
1400s
39s
7.2s
TPC-‐DS Query 27
3200s
65s
14.9s
TPC-‐DS Query 82
200x Improvement
Query 27: Pricing Analy"cs using Star Schema Join Query 82: Inventory Analy"cs Joining 2 Large Fact Tables
All Results at Scale Factor 200 (Approximately 200GB Data)
41.1s
4.2s
39.8s
4.1s TPC-‐DS Query 52 TPC-‐DS Query 55
Query Time in Seconds
Speed: Delivering Interactive Query
Test Cluster: • 200 GB Data (ORCFile) • 20 Nodes, 24GB RAM each, 6x disk each
Hive 0.12
Hive 0.13 (Phase 3)
Query 52: Star Schema Join with group-‐by, order-‐by on different keys Query 55: Star Schema Join with group-‐by, order-‐by on different keys
22s
9.8s
31s
6.7s TPC-‐DS Query 28 TPC-‐DS Query 12
Query Time in Seconds
Speed: Delivering Interactive Query
Test Cluster: • 200 GB Data (ORCFile) • 20 Nodes, 24GB RAM each, 6x disk each
Hive 0.12
Hive 0.13 (Phase 3)
Query 28: Four sub-‐query join (Vectoriza"on) Query 12: Star Join over range of dates (M-‐R-‐R palern)
Hortonworks Confidential © 2014
Speed@Scale: Large Scale Implementation
Page 20
http://blogs.cisco.com/datacenter/hdp
Cisco Engineering Blog Post Independent assessment by Cisco UCS Team Benchmark @ 30TB
Hortonworks Confidential © 2014
Speed@Scale: Large Scale Implementation
Page 21
https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/
Facebook Engineering Blog Post Hortonworks engineering team worked on ORCFile Facebook provided improvements to ORCFile, working with Hortonworks Hive is used for efficient analytics on the largest Hadoop Data Warehouse site Ultimate Scale Data Analysis
Your Fastest On-ramp to Enterprise Hadoop™!
Page 22
hep://hortonworks.com/products/hortonworks-‐sandbox/
The Sandbox lets you experience Apache Hadoop from the convenience of your own laptop – no data center, no cloud and no internet connec+on needed! The Hortonworks Sandbox is: • A free download: hep://hortonworks.com/products/hortonworks-‐sandbox/ • A complete, self contained virtual machine with Apache Hadoop pre-‐configured • A personal, portable and standalone Hadoop environment • A set of hands-‐on, step-‐by-‐step tutorials that allow you to learn and explore Hadoop
Ques+ons?
@ddkaiser linkedin.com/in/dkaiser slideshare.net/ddkaiser [email protected] [email protected]
David Kaiser