john melesky - federating queries using postgres fdw @ postgres open

73
Federating Queries Using postgres fdw john melesky Rentrak, Inc September 17, 2013

Upload: postgresopen

Post on 12-May-2015

2.122 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Federating Queries Using postgres fdw

john melesky

Rentrak, Inc

September 17, 2013

Page 2: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Who Am I?

I A long-time programmer, working with PostgreSQL in thecloud

Page 3: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Who Am I?

I A long-time programmer, working with PostgreSQL in thecloud my butt

Page 4: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Who Am I?

I A long-time programmer, working with PostgreSQL in thecloud my butt

I Now, a DBA, working with PostgreSQL on real machines withreal disks

Page 5: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Who Am I?

I A long-time programmer, working with PostgreSQL in thecloud my butt

I Now, a DBA, working with PostgreSQL on real machinesVMWare with real disks

Page 6: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Who Am I?

I A long-time programmer, working with PostgreSQL in thecloud my butt

I Now, a DBA, working with PostgreSQL on real machinesVMWare with real disks NetApps

Page 7: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

PostgreSQL inheritance partitioning

create table transactions (

id serial,

user_id bigint,

time_utc timestamp,

int_value bigint,

txt_value text,

primary key (id)

);

create table transactions_201306 (

like transactions including indexes,

check

(time_utc >= ’2013-06-01’ and

time_utc < ’2013-07-01’)

) inherits (transactions);

You know this already

Page 8: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

PostgreSQL inheritance partitioning

create table transactions (

id serial,

user_id bigint,

time_utc timestamp,

int_value bigint,

txt_value text,

primary key (id)

);

create table transactions_201306 (

like transactions including indexes,

check

(time_utc >= ’2013-06-01’ and

time_utc < ’2013-07-01’)

) inherits (transactions);

You know this already

Page 9: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Old-school partitioning

create view transactions as (

select * from transactions_201301

union all

select * from transactions_201302

union all

select * from transactions_201303

union all

select * from transactions_201304

union all

...

);

Page 10: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Why don’t we still use this?

1. No insert triggers on views

2. No ”inherit indexes” without additional misdirection

3. Basically, we have a better option with inheritence partitioning

Page 11: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Why don’t we still use this?

1. No insert triggers on views

2. No ”inherit indexes” without additional misdirection

3. Basically, we have a better option with inheritence partitioning

Page 12: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Why don’t we still use this?

1. No insert triggers on views

2. No ”inherit indexes” without additional misdirection

3. Basically, we have a better option with inheritence partitioning

Page 13: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Why don’t we still use this?

1. No insert triggers on views

2. No ”inherit indexes” without additional misdirection

3. Basically, we have a better option with inheritence partitioning

Page 14: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Postgres Foreign Data Wrapper

-- just once

create extension postgres_fdw;

-- once per data node

create server node0 foreign data wrapper postgres_fdw

options (connection stuff);

create user mapping for app_user server node0;

-- once per table per node

create foreign table transactions_node0

(table definition)

server node0

options (table_name ’transactions’);

Page 15: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Federating, Old-school

create view transactions as (

select * from transactions_node0

union all

select * from transactions_node1

union all

select * from transactions_node2

union all

select * from transactions_node3

union all

...

);

Page 16: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Querying

primary=# explain select count(*) from transactions;

QUERY PLAN

---------------------------------------------------------

Aggregate (cost=1767.38..1767.39 rows=1 width=0)

-> Append (cost=100.00..1699.12 rows=27304 width=0)

-> Foreign Scan on transactions_node0

(cost=100.00..212.39 rows=3413 width=0)

-> Foreign Scan on transactions_node1

(cost=100.00..212.39 rows=3413 width=0)

-> Foreign Scan on transactions_node2

(cost=100.00..212.39 rows=3413 width=0)

-> Foreign Scan on transactions_node3

(cost=100.00..212.39 rows=3413 width=0)

-> Foreign Scan on transactions_node4

(cost=100.00..212.39 rows=3413 width=0)

...

(10 rows)

Time: 1.226 ms

Page 17: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Querying

primary=# explain verbose select count(*) from transactions;

QUERY PLAN

-----------------------------------------------------------------

Aggregate (cost=1767.38..1767.39 rows=1 width=0)

Output: count(*)

-> Append (cost=100.00..1699.12 rows=27304 width=0)

-> Foreign Scan on public.transactions_node0

(cost=100.00..212.39 rows=3413 width=0)

Remote SQL: SELECT NULL FROM public.transactions

-> Foreign Scan on public.transactions_node1

(cost=100.00..212.39 rows=3413 width=0)

Remote SQL: SELECT NULL FROM public.transactions

-> Foreign Scan on public.transactions_node2

(cost=100.00..212.39 rows=3413 width=0)

Remote SQL: SELECT NULL FROM public.transactions

...

(19 rows)

Time: 1.273 ms

Page 18: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Querying

primary=# select count(*) from transactions;

count

---------

1095336

(1 row)

Time: 3035.054 ms

Page 19: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Round-robin

primary

node 0(id % 4 = 0)

node 1(id % 4 = 1)

node 2(id % 4 = 2)

node 3(id % 4 = 3)

Page 20: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Round-robin

primary=# create foreign table transactions_node0 (

primary(# id serial,

primary(# user_id bigint,

primary(# time_utc timestamp,

primary(# int_value bigint,

primary(# txt_value text,

primary(# check ((id % 8) = 0)

primary(# ) server node0

primary(# options (table_name ’transactions’);

ERROR: constraints are not supported on foreign tables

LINE 6: check ((id % 8) = 0)) server node0 ...

Page 21: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Domain-based (aka ”sharding”)

primary

node 0(customer = 'bigone')

node 1(customer in ('bigtwo', 'bigthree')

node 2(customer in (...))

node 3(customer in (...))

Page 22: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Range-based

primary

node 0(date between '2013-01-01' and '2013-01-31')

node 1(date between ...)

node 2(date between ...)

node 3date between ...)

Page 23: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Table-based

primary

node 0(users table(s))

node 1(transactions table)

node 2(session tables)

Page 24: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-head

primary1

node0 node1node2 node3

primary2

Page 25: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-head

primary1

node0 node1node2 node3

primary2primary3 primary4

Page 26: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Demo time

Page 27: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Network traffic

primary=# select count(*) from transactions_local;

count

---------

1095336

(1 row)

Time: 209.097 ms

primary=# select count(*) from transactions_primary;

count

---------

1095336

(1 row)

Time: 2867.385 ms

Page 28: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Network traffic

primary=# select count(*) from transactions_local;

count

---------

1095336

(1 row)

Time: 209.097 ms

primary=# select count(*) from transactions_primary;

count

---------

1095336

(1 row)

Time: 2867.385 ms

Page 29: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Network traffic

primary=# select count(*) from transactions_local;

count

---------

1095336

(1 row)

Time: 209.097 ms

primary=# select count(*) from transactions_primary;

count

---------

1095336

(1 row)

Time: 2867.385 ms

Page 30: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Dumb queries

primary=# explain verbose select count(*) from transactions;

QUERY PLAN

-----------------------------------------------------------------

Aggregate (cost=1767.38..1767.39 rows=1 width=0)

Output: count(*)

-> Append (cost=100.00..1699.12 rows=27304 width=0)

-> Foreign Scan on public.transactions_node0

(cost=100.00..212.39 rows=3413 width=0)

Remote SQL: SELECT NULL FROM public.transactions

...

primary=# explain verbose select avg(int_value) from transactions;

QUERY PLAN

----------------------------------------------------------------------------

Aggregate (cost=1545.60..1545.61 rows=1 width=8)

Output: avg(transactions_node0.int_value)

-> Append (cost=100.00..1494.40 rows=20480 width=8)

-> Foreign Scan on public.transactions_node0

(cost=100.00..186.80 rows=2560 width=8)

Output: transactions_node0.int_value

Remote SQL: SELECT int_value FROM public.transactions

...

Page 31: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Dumb queries

primary=# explain verbose select count(*) from transactions;

QUERY PLAN

-----------------------------------------------------------------

Aggregate (cost=1767.38..1767.39 rows=1 width=0)

Output: count(*)

-> Append (cost=100.00..1699.12 rows=27304 width=0)

-> Foreign Scan on public.transactions_node0

(cost=100.00..212.39 rows=3413 width=0)

Remote SQL: SELECT NULL FROM public.transactions

...

primary=# explain verbose select avg(int_value) from transactions;

QUERY PLAN

----------------------------------------------------------------------------

Aggregate (cost=1545.60..1545.61 rows=1 width=8)

Output: avg(transactions_node0.int_value)

-> Append (cost=100.00..1494.40 rows=20480 width=8)

-> Foreign Scan on public.transactions_node0

(cost=100.00..186.80 rows=2560 width=8)

Output: transactions_node0.int_value

Remote SQL: SELECT int_value FROM public.transactions

...

Page 32: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Dumb queries

primary=# explain verbose select count(*) from transactions;

QUERY PLAN

-----------------------------------------------------------------

Aggregate (cost=1767.38..1767.39 rows=1 width=0)

Output: count(*)

-> Append (cost=100.00..1699.12 rows=27304 width=0)

-> Foreign Scan on public.transactions_node0

(cost=100.00..212.39 rows=3413 width=0)

Remote SQL: SELECT NULL FROM public.transactions

...

primary=# explain verbose select avg(int_value) from transactions;

QUERY PLAN

----------------------------------------------------------------------------

Aggregate (cost=1545.60..1545.61 rows=1 width=8)

Output: avg(transactions_node0.int_value)

-> Append (cost=100.00..1494.40 rows=20480 width=8)

-> Foreign Scan on public.transactions_node0

(cost=100.00..186.80 rows=2560 width=8)

Output: transactions_node0.int_value

Remote SQL: SELECT int_value FROM public.transactions

...

Page 33: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Dumb queries

select type, count(*)

from users

group by type

order by 2 desc;

Page 34: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Joins

select count(*)

from transactions t, users u

where t.user_id = u.id

and u.type = ’mistaken’;

Page 35: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Joins

select count(*)

from transactions t, users u

where t.user_id = u.id

and u.type = ’mistaken’;

Page 36: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Keys

’Nuff said

Page 37: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Keys

’Nuff said

Page 38: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Constraint exclusionRemember this?

ERROR: constraints are not supported on foreign tables

LINE 6: check ((id % 8) = 0)) server node0 ....

Page 39: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Single-threaded executer

How many nodes do you have?Do you know what they’re doing?

Page 40: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Single-threaded executer

How many nodes do you have?

Do you know what they’re doing?

Page 41: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Limitations: Single-threaded executer

How many nodes do you have?Do you know what they’re doing?

Page 42: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategies

I Large working set, small nodes

I Node-level partitioning

I Heavy distributed processing

I Multi-head

Page 43: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Large working set, small nodes

I Your working set is larger than one node’s RAM

I ... but you have lots of nodes

I (and network is faster than disk)

I This might be worth looking into if you’re on AWS, butplease, please test it first

Page 44: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Large working set, small nodes

I Your working set is larger than one node’s RAM

I ... but you have lots of nodes

I (and network is faster than disk)

I This might be worth looking into if you’re on AWS, butplease, please test it first

Page 45: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Large working set, small nodes

I Your working set is larger than one node’s RAM

I ... but you have lots of nodes

I (and network is faster than disk)

I This might be worth looking into if you’re on AWS, butplease, please test it first

Page 46: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Large working set, small nodes

I Your working set is larger than one node’s RAM

I ... but you have lots of nodes

I (and network is faster than disk)

I This might be worth looking into if you’re on AWS, butplease, please test it first

Page 47: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Large working set, small nodes

I Your working set is larger than one node’s RAM

I ... but you have lots of nodes

I (and network is faster than disk)

I This might be worth looking into if you’re on AWS, butplease, please test it first

Page 48: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Node-level partitioning

Like parititioning, but with a separate node per partition group!

As a total strategy, this is probably not worthwhile. However, itcan work with a fast ”current data” node combining with slower”archived data” nodes.

Page 49: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Strategy: Node-level partitioning

Like parititioning, but with a separate node per partition group!As a total strategy, this is probably not worthwhile. However, itcan work with a fast ”current data” node combining with slower”archived data” nodes.

Page 50: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Heavy distributed processing

I Take advantage of lots of CPUs

I Works well when you have node-discrete workloads

I Lock management can become a bit hairier

I This might actually be a useful use case

Page 51: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Heavy distributed processing

I Take advantage of lots of CPUs

I Works well when you have node-discrete workloads

I Lock management can become a bit hairier

I This might actually be a useful use case

Page 52: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Heavy distributed processing

I Take advantage of lots of CPUs

I Works well when you have node-discrete workloads

I Lock management can become a bit hairier

I This might actually be a useful use case

Page 53: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Heavy distributed processing

I Take advantage of lots of CPUs

I Works well when you have node-discrete workloads

I Lock management can become a bit hairier

I This might actually be a useful use case

Page 54: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Heavy distributed processing

I Take advantage of lots of CPUs

I Works well when you have node-discrete workloads

I Lock management can become a bit hairier

I This might actually be a useful use case

Page 55: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-headed

I Like replication, but with no overhead or delay!

I Also, no storage overhead!

I Might work well with the distributed processing setup

I In fact, given the overhead that lands on the head node, itmight be necessary for a working FDW federation setup

Page 56: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-headed

I Like replication, but with no overhead or delay!

I Also, no storage overhead!

I Might work well with the distributed processing setup

I In fact, given the overhead that lands on the head node, itmight be necessary for a working FDW federation setup

Page 57: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-headed

I Like replication, but with no overhead or delay!

I Also, no storage overhead!

I Might work well with the distributed processing setup

I In fact, given the overhead that lands on the head node, itmight be necessary for a working FDW federation setup

Page 58: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-headed

I Like replication, but with no overhead or delay!

I Also, no storage overhead!

I Might work well with the distributed processing setup

I In fact, given the overhead that lands on the head node, itmight be necessary for a working FDW federation setup

Page 59: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Multi-headed

I Like replication, but with no overhead or delay!

I Also, no storage overhead!

I Might work well with the distributed processing setup

I In fact, given the overhead that lands on the head node, itmight be necessary for a working FDW federation setup

Page 60: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Pan-Strategy Advice

I Think very carefully about what tables should live whereI Think very carefully about tuning settings (especially on your

head node)I work memI shared buffersI temp buffers

I Think very carefully about how many data nodes you want

I Think very carefully about network vs. disk vs. dumb-querycosts

I Think very carefully!

Page 61: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Pan-Strategy Advice

I Think very carefully about what tables should live where

I Think very carefully about tuning settings (especially on yourhead node)

I work memI shared buffersI temp buffers

I Think very carefully about how many data nodes you want

I Think very carefully about network vs. disk vs. dumb-querycosts

I Think very carefully!

Page 62: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Pan-Strategy Advice

I Think very carefully about what tables should live whereI Think very carefully about tuning settings (especially on your

head node)I work memI shared buffersI temp buffers

I Think very carefully about how many data nodes you want

I Think very carefully about network vs. disk vs. dumb-querycosts

I Think very carefully!

Page 63: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Pan-Strategy Advice

I Think very carefully about what tables should live whereI Think very carefully about tuning settings (especially on your

head node)I work memI shared buffersI temp buffers

I Think very carefully about how many data nodes you want

I Think very carefully about network vs. disk vs. dumb-querycosts

I Think very carefully!

Page 64: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Pan-Strategy Advice

I Think very carefully about what tables should live whereI Think very carefully about tuning settings (especially on your

head node)I work memI shared buffersI temp buffers

I Think very carefully about how many data nodes you want

I Think very carefully about network vs. disk vs. dumb-querycosts

I Think very carefully!

Page 65: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Pan-Strategy Advice

I Think very carefully about what tables should live whereI Think very carefully about tuning settings (especially on your

head node)I work memI shared buffersI temp buffers

I Think very carefully about how many data nodes you want

I Think very carefully about network vs. disk vs. dumb-querycosts

I Think very carefully!

Page 66: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Thanks!

Page 67: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Questions?

Any questions?

John, do you use this approach for your databases?Why not?

Page 68: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Questions?

Any questions?John, do you use this approach for your databases?

Why not?

Page 69: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Questions?

Any questions?John, do you use this approach for your databases?Why not?

Page 70: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Thanks!

Plug: Stephen Frost has another postgres fdw talk tomorrowAlso: Rentrak is hiring: programmers, sysadmins, and devops

Page 71: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Thanks!

Plug: Stephen Frost has another postgres fdw talk tomorrow

Also: Rentrak is hiring: programmers, sysadmins, and devops

Page 72: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Thanks!

Plug: Stephen Frost has another postgres fdw talk tomorrowAlso: Rentrak is hiring: programmers, sysadmins, and devops

Page 73: John Melesky - Federating Queries Using Postgres FDW @ Postgres Open

Federating Queries Using postgres fdw

IntroductionWho am I?

PartitioningPostgreSQL inheritance partitioningOld-school partitioning

Federating Queries

Federation Strategies Overview

Trial and ErrorDemoLimitations

Strategies

Wrap-up