transactional sql in apache hive
Post on 21-Jan-2018
987 Views
Preview:
TRANSCRIPT
Transactional Operations in HiveEugene Koifman
June 2017
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Motivations/Goals
End user point of view
Design
Performance Improvements/Results
Roadmap
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Motivations
Modifying existing data– INSERT OVERWRITE TABLE Target SELECT * FROM Target WHERE …
• Delete – OK, Update - ?• Concurrency
– Hope for the best (multiple updates)
– ZooKeeper lock manager S/X locks – restrictive
• Expensive to do repeatedly (write side)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Motivations
Continuously adding new data to Hive in the past– ALTER TABLE Target ADD PARTITION (dt=‘2016-06-30’)
• Lots of files – bad for performance
• Fewer files –users wait longer to see latest data– INSERT INTO Target as SELECT FROM Staging
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Merge Statement – SQL Standard 2011 (Hive 2.2)
ID State County Value
1 CA LA 19.0
2 MA Norfolk 15.0
7 MA Suffolk 50.15
16 CA Orange 9.1
ID State Value
1 20.0
7 80.0
100 NH 6.0
MERGE INTO TARGET T
USING SOURCE S ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET T.Value=S.Value
WHEN NOT MATCHED
INSERT (ID,State,Value)
VALUES(S.ID, S.State, S.Value)
ID State County Value
1 CA LA 20.0
2 MA Norfolk 15.0
7 MA Suffolk 80.0
16 CA Orange 9.1
100 NH null 6.0
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals
Make above use cases easy and efficient
Key Requirement– Long running analytics queries should run concurrently with update commands
NOT OLTP!!!– Support slowly changing tables
– Not for 100s of concurrent queries trying to update the same partition
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
System at High Level
A new type of table that supports Insert/Update/Delete/Merge SQL operations
Concept of ACID transaction– Atomic, Consistent, Isolated, Durable
Streaming Ingest API– Write a continuous stream of events to Hive in micro batches with transactional semantics
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User Point of View
CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 8 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
Not all tables support transactional semantics
Table must be bucketed
Table cannot be sorted
Currently requires ORC File but anything implementing format – AcidInputFormat/AcidOutputFormat
autoCommit=true
Transactions run at Snapshot Isolation– Lock in the state of the DB as of the start of the query for the duration of the query
– Between Serializable and Repeatable Read
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design
Transaction Manager– Begin transaction and obtain a transaction ID
Storage layer enhanced to support MVCC architecture– Each row is tagged with unique ROW_ID (internal)
– Multiple versions of each row to allow concurrent readers and writers
– Result of each write is stored in a new Delta file
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
CREATE TABLE acidtbl (a INT, b STRING) CLUSTERED BY (a) INTO 1 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional'='true');
ACID Metadata Columns original_transaction_idbucket_idrow_idcurrent_transaction_id
User Columns col_1: a : INT
col_2:b : STRING
ACID_PK
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
INSERT INTO acidtbl (a,b) VALUES (100, “foo”), (200, “xyz”), (300, “bee”);
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
delta_00001_00001/bucket_0000
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
UPDATE acidTbl SET b = “bar” where a = 300;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
delta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
DELETE FROM acidTbl where a = 200;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
ACID_PK a b
{ 1, 0, 1 } null null
delta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How ACID is implemented in Hive?
SELECT * FROM acidtbl;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
ACID_PK a b
{ 1, 0, 1 } null null
delta_00001_00001/bucket_0000 delta_00002_00002/bucket_0000 delta_00003_00003/bucket_0000
{ 1, 0, 0 } 100 “foo” 100 “foo”{ 1, 0, 1 } 200 “xyz”{ 1, 0, 1 } null null{ 1, 0, 2 } 300 “bee”{ 1, 0, 2 } 300 “bar”
300 “bar”
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Compactor
More operations = more delta files – make reads more expensive
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Compactor
ALTER TABLE acidTbl COMPACT ‘MAJOR’;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
ACID_PK a b
{ 1, 0, 1 } null nulldelta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000
delta_00003_00003/bucket_0000
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 2 } 200 “bar”
base_00003/bucket_0000
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Compactor
Compactor rewrites the table in the background– Minor compaction - merges delta files into fewer deltas
– Major compactor merges deltas with base - more expensive
– This amortizes the cost of updates and self tunes the tables
• Makes ORC more efficient - larger stripes, better compression
Compaction can be triggered automatically or on demand– There are various configuration options to control when the process kicks in.
– Compaction itself is a Map-Reduce job
Key design principle is that compactor does not affect readers/writers
Cleaner process – removes obsolete files
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Concurrency
Transaction Manager– manages transaction ID assignment
– keeps track of transaction state: open, committed, aborted
Lock Manager– DDL operations acquire eXclusive locks
– Read operations acquire Shared locks
– Also locks non transactional tables – different logic
• hive.txn.strict.locking.mode
State of both persisted in Hive Metastore
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Concurrency
Write Set tracking to prevent Write-Write conflicts in concurrent transactions
Note that 2 Inserts are never in conflict since Hive does not enforce unique constraints.
You are allowed to read acid and non-acid tables in same query.
You cannot write to acid and non-acid tables at the same time (multi-insert statement)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Design - Streaming Ingest
Allows you to continuously write events to a hive table– Can commit periodically to make writes durable/visible
– Can also call abort to make writes since last commit/abort invisible.
– Optimized so that it can handle writing micro batches of events - every second.
• Multiple transactions are written to one file
– Only supports adding new data
Streaming tools like NiFi, Storm and Flume rely on this API to ingest data into hive
This API is public so it can be used directly
Data written via Streaming API has the same transactional semantics as SQL side
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Merge Statement – SQL Standard 2011 (Hive 2.2)
ID State County Value
1 CA LA 19.0
2 MA Norfolk 15.0
7 MA Suffolk 50.15
16 CA Orange 9.1
ID State Value
1 20.0
7 80.0
100 NH 6.0
MERGE INTO TARGET T
USING SOURCE S ON T.ID=S.ID
WHEN MATCHED THEN
UPDATE SET T.Value=S.Value
WHEN NOT MATCHED
INSERT (ID,State,Value)
VALUES(S.ID, S.State, S.Value)
ID State County Value
1 CA LA 20.0
2 MA Norfolk 15.0
7 MA Suffolk 80.0
16 CA Orange 9.1
100 NH null 6.0
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SQL Merge
Target
Source
ACID_PK ID State
County
Value
{ 1, 0, 1 } 1 CA LA 20.0
{ 1, 0, 3 } 7 MA Suffolk 80.0
ACID_PK ID State County
Value
{ 2, 0, 1 } 100 NH 6.0
delta_00002_00002/bucket_0000
delta_00002_00002_001/bucket_0000
Right Outer Join
ON T.ID=S.ID
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
W/o MERGE – much less efficient
UPDATE Target set Value= 20.0 where ID = 1;
UPDATE Target set Value = 80.0 where ID = 7;
INSERT INTO Target (ID, State, Value) VALUES(100, ‘NH’, 6.0);
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Work-In-Progress
Split an update into combination of delete and insert
UPDATE acidTbl SET b = “bar” where a = 300;
ACID_PK a b
{ 1, 0, 0 } 100 “foo”
{ 1, 0, 1 } 200 “xyz”
{ 1, 0, 2 } 300 “bee”
ACID_PK a b
{ 1, 0, 2 } 300 “bar”
delta_00001_00001/bucket_0000
delta_00002_00002/bucket_0000
ACID_PK a b
{ 2, 0, 0 } 300 “bar”
ACID_PK a b
{ 1, 0, 2 } null null
delta_00002_00002/bucket_0000 delete_delta_00002_00002/bucket_0000
Enabled PPD
Splits for Delta files
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Benefits
Improved PPD
Better Network Utilization
Better Memory Utilization
Full Vectorization of Reads
Updating bucket/partition columns
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Performance
TPC-H Benchmark– 10 node cluster at Scale Factor 1000 (1 TB of data)
– 11 delta files with 90 GB data each
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Work
Multi statement transactions, i.e. BEGIN TRANSACTION/COMMIT/ROLLBACK
Performance– Smarter Compaction
Finer grained concurrency management/conflict detection
Read Committed w/Lock Based scheduling
Better Monitoring/Alerting
LOAD DATA … support
Optional bucketing
SMB support – user defined sort order
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Further Reading
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Etc
Documentation– https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
– https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
Follow/Contribute– https://issues.apache.org/jira/browse/HIVE-
14004?jql=project%20%3D%20HIVE%20AND%20component%20%3D%20Transactions
user@hive.apache.org
dev@hive.apache.org
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
top related