apache phoenix: oltp in hadoop, by james taylor - saleforce.com

22
V5 OLTP for Hadoop @ApachePhoenix http://phoenix.apache.org/ James Taylor (@JamesPlusPlus)

Upload: cask-data-inc

Post on 21-Jan-2017

881 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

V5

OLTP for Hadoop@ApachePhoenix

http://phoenix.apache.org/James Taylor (@JamesPlusPlus)

Page 2: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

About James

• Architect at Salesforce.com– Part of the Big Data group– Lead of the Apache Phoenix– PMC of Apache Calcite

Page 3: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

What is Apache Phoenix?

• A relational database layer for Apache HBase

Page 4: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

What is

• High performance horizontally scalable byte store• Suitable as store of record for mission critical data

?

Page 5: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

What is Apache Phoenix?

Page 6: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

What is Apache Phoenix?

• A relational database layer for Apache HBase– OLTP/Query engine

• Transforms SQL into native HBase API calls• Pushes work to cluster for parallel execution• Supports ACID transactions

– Metadata repository• Typed access to data stored in HBase tables• Support multi-tenancy modeled as SQL views

– JDBC driver• A top level Apache Software Foundation project

– Originally developed at Salesforce– A growing community with momentum

Page 7: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Where Does Phoenix Fit In?Sq

oop

RDB

Data

Col

lect

orFl

ume

Log

Data

Col

lect

or

Zook

eepe

rCo

ordi

natio

n

YARNCluster Resource

Manager / MapReduce

HDFS 2.0Hadoop Distributed File System

GraphXGraph analysis

framework

PhoenixQuery execution engine

HBaseDistributed Database

The Java Virtual Machine HadoopCommon JNI

SparkIterative In-Memory

Computation

MLLibData mining

PigData Manipulation

HiveStructured Query

PhoenixJDBC client

Page 8: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Why is Phoenix fast?• Pushes down computation to region servers

– Start/stop key range(s)– Time range min/max– Predicates– Aggregation– Sort– Limit– TopN

• Parallelizes query from client– Intra-region through statistics collection

• Supports secondary indexes– Global & co-located

Page 9: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Client

Server Server Server Server Server ServerPartitioned TableR1 R2 … … Rn

Intra-region guideposts

Parallel scans

Scan rangeScan ranges

• Filter• Aggregate• Sort• Limit

Scan range

Why is Phoenix fast?

Page 10: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Skip Scans

• Push key ranges through filter• Use SEEK_NEXT_HINT to skip data

R1

Include

Include

Include

Skip

Skip

Skip

Page 11: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Filters

• Pushes down WHERE clause to server• Filters entire HFile based on time-range

R1HFilet1 - 10

HFilet11 - 20

HFilet21 - 30

HFilet31 - 40 ScanHFilet31 - 40 Scan

Page 12: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Transactions

• Snapshot isolation model– Using Tephra (http://tephra.io/)– Supports SERIALIZABLE isolation level– Allows reading your own uncommitted data

• Optional– Enabled on a table by table basis– No performance penalty when not used

• Available in 4.7.0 release (being voted on now)

Page 13: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Optimistic Concurrency Control

• Avoids cost of locking rows and tables• No deadlocks or lock escalations• Cost of conflict detection and possible rollback is

higher• Good if conflicts are rare: short transaction, disjoint

partitioning of work• Conflict detection not always necessary:

write-once/append-only data

Page 14: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Tephra Architecture

ZooKeeper

Tx Manager(standby)

HBase

Master 1

Master 2

RS 1

RS 2 RS 4

RS 3

Client 1

Client 2

Client N

Tx Manager(active)

Page 15: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Transaction Lifecycle

time outtry abort

failedroll backin HBase

writeto

HBasedo work

Client Tx Manager

none

complete Vabortsucceeded

in progress

start txstart

start tx

committry commit check conflicts

invalid Xinvalidatefailed

Page 16: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Tephra Architecture

• TransactionAware client• Coordinates transaction lifecycle with manager• Communicates directly with HBase for reads and writes

• Transaction Manager• Assigns transaction IDs• Maintains state on in-progress, committed and invalid transactions

• Transaction Processor coprocessor• Applies server-side filtering for reads• Cleans up data from failed transactions, and no longer visible

versions

Page 17: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Demo

• Debit/credit example• Multiple clients updating account balance through stream of

debits/credits.• Without transactions, balance will not be correct

Page 18: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

What’s Next?

You are here

Page 19: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Introducing Apache Calcite• Query parser, compiler, and planner framework

– SQL-92 compliant• Pluggable cost-based optimizer framework

– Sane way to model push down through rules• Interop with other Calcite adaptors

– Already used by Drill, Hive, Kylin, Samza– Supports any JDBC source (RDBMS - remember them )– One cost-model to rule them all

Page 20: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

How does Phoenix plug in?

Calcite Parser & Validator

Calcite Query Optimizer

Phoenix Query Plan GeneratorPhoenix Runtime

JDBC Client

SQL + Phoenix specific grammar

Built-in rules + Phoenix specific

rules

Page 21: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

What’s Next?Query

Drillbit

Calcite

Phoenix/HBase HDFS

Drillbit

Calcite

RDBMS

Drillbit

Calcite

Samza/ Streams

Drillbit

Calcite

Page 22: Apache Phoenix: OLTP in Hadoop, by James Taylor - Saleforce.com

Thank you!Questions?

* who uses Phoenix