apache phoenix + apache hbase

Post on 24-Jan-2018

2.967 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Phoenix + Apache HBaseAn Enterprise Grade Data Warehouse

Ankit Singhal , Rajeshbabu , Josh Elser

June, 30 2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About us!!

– Committer and member of Apache Phoenix PMC

– MTS at Hortonworks.

Ankit Singhal

– Committer and member of Apache Phoenix PMC

– Committer in Apache HBase

– MTS at Hortonworks.

RajeshBabu

– Committer in Apache Phoenix

– Committer and Member of Apache Calcite PMC

– MTS at Hortonworks.

Josh Elser

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaPhoenix & HBase as an Enterprise Data Warehouse

Use Cases

Optimizations

Phoenix Query server

Q&A

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Warehouse

EDW helps organize and aggregate analytical data from various functional domains and serves as a critical repository for organizations’ operations.

STA

GIN

G

Files

IOTdata

Data Warehouse

Mart

OLTP

ETL Visualization or BI

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Phoenix Offerings and Interoperability:-

ETL Data Warehouse Visualization & BI

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Table,a,123

Table,,123

RegionServer

HDFS

HBase client

Phoenix client

Phx coproc

Zoo

Keep

er

Table,b,123

Table,a,123

Phx coproc

Table,c,123

Table,b,123

Phx coproc

RegionServer RegionServer

Application

HBase & PhoenixHBase , a distributed NoSQL storePhoenix , provides OLTP and Analytics over HBase

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Open Source Data Warehouse

Hardware cost

Soft

war

e co

st

Specialized H/WCommodity H/W

Lice

nsi

ng

cost

No

Co

stSMPMPP

Open Source MPP

HBase+ Phoenix

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Phoenix & HBase as a Data Warehouse

Architecture

Run on commodity

H/WTrue MPP

O/S and H/W

flexibility

Support OLTP and

ROLAP

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Phoenix & HBase as a Data Warehouse

Scalability

Linear scalability for storage

Linear scalability

for memory

Open to Third party

storage

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Phoenix & HBase as a Data Warehouse

Reliability

Highly Available

Replication for disaster

recovery

Fully ACID for Data Integrity

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Phoenix & HBase as a Data Warehouse

Manageability

Performance Tuning

Data Modeling &

Schema Evolution

Data pruning

Online expansion

Or upgradeData Backup and recovery

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaPhoenix & HBase as an Enterprise Data Warehouse

Use cases

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Who uses Phoenix !!

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Analytics Use case - (Web Advertising company)

Functional Requirements– Create a single source of truth

– Cross dimensional query on 50+ dimension and 80+ metrics

– Support fast Top-N queries

Non-functional requirements– Less than 3 second Response time for slice and dice

– 250+ concurrent users

– 100k+ Analytics queries/day

– Highly available

– Linear scalability

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Warehouse Capacity

Data Size(ETL Input)– 24TB/day of raw data system wide

– 25 Billion of impressions

HBase Input(cube)– 6 Billion rows of aggregated data(100GB/day)

HBase Cluster size– 65 Nodes of HBase

– 520 TB of disk

– 4.1 TB of memory

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case Architecture

AdServer

Click Tracking

KafkaInput

KafkaInput

ETL Filter Aggregate

In- Memory Store

ETL Filter Aggregate

Real-time

KafkaCAMUS

HDFSETL

HDFS

DataUploader

DATA

API

HBaseViews

A

N

A

L

Y

T

I

C

S

UI

Batch Processing

Data Ingestion Analytics

ApacheKafka

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Cube Generation

Cubes are stored in HBase

A

N

A

L

Y

T

I

C

S

UI

Convert slice and

dice query to SQL query

Data API

Analytics Data Warehouse Architecture

Bulk Load

HDFS

ETL

Backup and

recovery

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Time Series Use Case- (Apache Ambari)

Functional requirements– Store all cluster metrics collected every second(10k to 100k metrics/second)

– Optimize storage/access for time series data

Non-functional requirements– Near real time response time

– Scalable

– Real time ingestion

Ambari Metrics System (AMS)

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AMS architecture

Metric Monitors

Hosts

HadoopSinks

HBase

Phoenix

Metric Collector

AmbariServer

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaPhoenix & HBase as an Enterprise Data Warehouse

Use Cases

Optimizations

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Design

Most important criteria for driving overall performance of queries on the table

Primary key should be composed from most-used predicate columns in the queries

In most cases, leading part of primary key should help to convert queries into point lookups or range scans in HBase

Primary key design

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Design

Use salting to alleviate write hot-spotting

CREATE TABLE …(

) SALT_BUCKETS = N

– Number of buckets should be equal to number of RegionServers

Otherwise, try to presplit the table if you know the row key data set

CREATE TABLE …(

) SPLITS(…)

Salting vs pre-split

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Design

Use block encoding and/or compression for better performance

CREATE TABLE …(

) DATA_BLOCK_ENCODING= ‘FAST_DIFF’, COMPRESSION=‘SNAPPY’

Use region replication for read high availability

CREATE TABLE …(

) “REGION_REPLICATION” = “2”

Table properties

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Design

Set UPDATE_CACHE_FREQUENCY to bigger value to avoid frequently touching server for metadata updates

CREATE TABLE …(

) UPDATE_CACHE_FREQUENCY = 300000

Table properties

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Schema Design

Divide columns into multiple column families if there are rarely accessed columns– HBase reads only the files of column families specified in the query to reduce I/O

pk1 pk2

CF1 CF2

Col1 Col2 Col3 Col4 Col5 Col6 Col7

Frequently accessing columns Rarely accessing columns

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Secondary Indexes

Global indexes– Optimized for read heavy use cases

CREATE INDEX idx on table(…)

Local Indexes– Optimized for write heavy and space constrained use cases

CREATE LOCAL INDEX idx on table(…)

Functional indexes– Allow you to create indexes on arbitrary expressions.

CREATE INDEX UPPER_NAME_INDEX ON EMP(UPPER(FIRSTNAME||’ ’|| LASTNAME ))

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Secondary Indexes

Use covered indexes to efficiently scan over the index table instead of primary table.

CREATE INDEX idx ON table(…) include(…)

Pass index hint to guide query optimizer to select the right index for querySELECT /*+INDEX(<table> <index>)*/..

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Row Timestamp Column

Maps HBase native row timestamp to a Phoenix column

Leverage optimizations provided by HBase like setting the minimum and maximum time range for scans to entirely skip the store files which don’t fall in that time range.

Perfect for time series use cases.

Syntax

CREATE TABLE …(CREATED_DATE NOT NULL DATE

CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP…

)

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use of Statistics

Region A

Region F

Region L

Region R

Chunk A

Chunk C

Chunk F

Chunk I

Chunk L

Chunk O

Chunk R

Chunk U

A

F

R

L

A

F

R

L

C

I

O

U

Client Client

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Skip Scan

Phoenix supports skip scan to jump to matching keys directly when the query has key sets in predicate

SELECT * FROM METRIC_RECORD

WHERE METRIC_NAME LIKE 'abc%'

AND HOSTNAME in ('host1’, 'host2');

CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCANON 2 RANGES OVER METRIC_RECORD

['abc','host1'] - ['abd','host2']

Region1

Region2

Region3

Region4

Client

RS

3R

S 2

RS

1

Skip scan

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Join optimizations

Hash Join– Hash join outperforms other types of join algorithms when one of the relations is smaller or

records matching the predicate should fit into memory

Sort-Merge join– When the relations are very big in size then use the sort-merge join algorithm

NO_STAR_JOIN hint– For multiple inner-join queries, Phoenix applies a star-join optimization by default. Use this hint in

the query if the overall size of all right-hand-side tables would exceed the memory size limit.

NO_CHILD_PARENT_OPTIMIZATION hint– Prevents the usage of child-parent-join optimization.

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Optimize Writes

Upsert values– Call it multiple times before commit for batching mutations

– Use prepared statement when you run the query multiple times

Upsert select– Configure phoenix.mutate.batchSize based on row size

– Set auto-commit to true for writing scan results directly to HBase.

– Set auto-commit to true while running upsert selects on the same table so that writes happen at server.

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hints

SERIAL SCAN, RANGE SCAN

SERIAL

SMALL SCAN

Some important hints

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Additional References

For some more optimizations you can refer to these documents– http://phoenix.apache.org/tuning.html

– https://hbase.apache.org/book.html#performance

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaPhoenix & HBase as an Enterprise Data Warehouse

Use Cases

Optimizations

Phoenix Query Server

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Phoenix Query Server

A standalone service that proxies user requests to HBase/Phoenix– Optional

Reference client implementation via JDBC– ”Thick” versus “Thin”

First introduced in Apache Phoenix 4.4.0

Built on Apache Calcite’s Avatica– ”A framework for building database drivers”

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Traditional Apache Phoenix RPC Model

Table,a,123

Table,,123

RegionServer

HDFS

HBase client

Phoenix client

Phx coprocZoo

Keep

er Table,b,123

Table,a,123

Phx coproc

Table,c,123

Table,b,123

Phx coproc

RegionServer RegionServer

Application

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Query Server Model

Table,a,123

Table,,123

RegionServer

HDFS

HBase client

Phoenix client

Phx coprocZoo

Keep

er Table,b,123

Table,a,123

Phx coproc

Table,d,123

Table,b,123

Phx coproc

RegionServer RegionServer

Query Server

Application

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Query Server Technology

HTTP Server and wire API definition

Pluggable serialization– Google Protocol Buffers

“Thin” JDBC Driver (over HTTP)

Other goodies!– Pluggable metrics system

– TCK (technology compatibility kit)

– SPNEGO for Kerberos authentication

– Horizontally scalable with load balancing

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Query Server Clients

Go language database/sql/driver

– https://github.com/Boostport/avatica

.NET driver– https://github.com/Azure/hdinsight-phoenix-sharp

– https://www.nuget.org/packages/Microsoft.Phoenix.Client/1.0.0-preview

ODBC– Built by http://www.simba.com/, also available from Hortonworks

Python DB API v2.0 (not “battle tested”)– https://bitbucket.org/lalinsky/python-phoenixdb

Client enablement

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaPhoenix & HBase as an Enterprise Data Warehouse

Use Cases

Optimizations

Phoenix Query Server

Q&A

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

We hope to see you all migrating to Phoenix & HBase and expecting more questions on the user mailing lists.

Get involved in mailing lists:-user@phoenix.apache.orguser@hbase.apache.org

You can reach us on:-ankit@apache.orgrajeshbabu@apache.orgelserj@apache.org

Phoenix & HBase

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You

top related