the best of bruce's postgres slides - momjianmomjian.us/main/writings/pgsql/best_of.pdf · the...

Post on 11-Dec-2018

259 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Best of Bruce’s Postgres Slides

BRUCE MOMJIAN

This talk has the best slides from my 25+ Postgres presentations.Creative Commons Attribution License http://momjian.us/presentations

Last updated: November, 2018

1 / 26

Postgres System Architecture

utility

Plan

Optimal Path

Query

Postmaster

Postgres Postgres

Libpq

Main

Generate Plan

Traffic Cop

Generate Paths

Execute Plan

e.g. CREATE TABLE, COPYSELECT, INSERT, UPDATE, DELETE

Rewrite Query

Parse Statement

UtilityCommand

Storage ManagersCatalogUtilities

Access Methods Nodes / Lists

Mastering PostgreSQL Administration

2 / 26

Shared Memory Creation

postmaster postgres postgres

Program (Text)

Data

Program (Text)

Data

Shared Memory

Program (Text)

Data

Shared Memory Shared Memory

Stack Stack

fork()

Stack

Inside PostgreSQL Shared Memory

3 / 26

Shared Buffers and WAL

Recovery

fsync

fsync

Query and Checkpoint Operations Transaction Durability

BackendPostgres

BackendPostgres

BackendPostgres

PostgreSQL Shared Buffer Cache Write−Ahead Log

Kernel Disk Buffer Cache

Disk Blocks

PostgreSQL Performance Tuning

4 / 26

Backend Flowchart - Magnified

utility

Plan

Optimal Path

Query

Generate Plan

Traffic Cop

Generate Paths

Execute Plan

e.g. CREATE TABLE, COPYSELECT, INSERT, UPDATE, DELETE

Rewrite Query

Parse Statement

UtilityCommand

PostgreSQL Internals Through Pictures

5 / 26

Query Processing

FindExec: found "/var/local/postgres/./bin/postmaster" using argv[ 0]./bin/postmaster: BackendStartup: pid 3320 user postgres db test socket 5./bin/postmaster child[ 3320 ]: starting with (postgres −d99 −F −d99 −v131072 −p test )FindExec: found "/var/local/postgres/./bin/postgres" using argv[ 0]DEBUG: connection: host=[local] user=postgres database=testDEBUG: InitPostgresDEBUG: StartTransactionCommandDEBUG: query: SELECT firstname FROM friend WHERE age = 33;DEBUG: parse tree: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable ({ RTE :relname friend :relid 26912 :subquery <> :alias <> :eref { ATTR :relname friend :attrs ( "firstname" "lastname" "city" "state" "age" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod − 1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}DEBUG: rewritten parse tree:DEBUG: { QUERY :command 1 :utility <> :resultRelation 0 :into <> :isPortal false :isBinary false :isTemp false :hasAggs false :hasSubLinks false :rtable ({ RTE :relname friend :relid 26912 :subquery <> :alias <> :eref { ATTR :relname friend :attrs ( "firstname" "lastname" "city" "state" "age" )} :inh true :inFromCl true :checkForRead true :checkForWrite false :checkAsUser 0}) :jointree { FROMEXPR :fromlist ({ RANGETBLREF 1 }) :quals { EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 0 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod − 1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}} :rowMarks () :targetList ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :groupClause <> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <> :limitCount <> :setOperations <> :resultRelations ()}DEBUG: plan: { SEQSCAN :startup_cost 0.00 :total_cost 22.50 :rows 10 :width 12 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1042 :restypmod 19 :resname firstname :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 19 :varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16 :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 5 :vartype 23 :vartypmod − 1 :varlevelsup 0 :varnoold 1 :varoattno 5} { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [ 33 0 0 0 ] })}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 }DEBUG: ProcessQueryDEBUG: CommitTransactionCommandDEBUG: proc_exit( 0)DEBUG: shmem_exit( 0)DEBUG: exit( 0)./bin/postmaster: reaping dead processes..../bin/postmaster: CleanupProc: pid 3320 exited with status 0

PostgreSQL Internals Through Pictures

6 / 26

EXPLAIN with Constants of Various Frequencies

l | count | lookup_letter---+-------+-----------------------------------------------------------------------p | 199 | Seq Scan on sample (cost=0.00..13.16 rows=199 width=2)s | 9 | Seq Scan on sample (cost=0.00..13.16 rows=9 width=2)c | 8 | Seq Scan on sample (cost=0.00..13.16 rows=8 width=2)r | 7 | Seq Scan on sample (cost=0.00..13.16 rows=7 width=2)t | 5 | Bitmap Heap Scan on sample (cost=4.29..12.76 rows=5 width=2)f | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)v | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)d | 4 | Bitmap Heap Scan on sample (cost=4.28..12.74 rows=4 width=2)a | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)_ | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)u | 3 | Bitmap Heap Scan on sample (cost=4.27..11.38 rows=3 width=2)e | 2 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)i | 1 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)k | 1 | Index Scan using i_sample on sample (cost=0.00..8.27 rows=1 width=2)

Explaining the Postgres Query Optimizer

7 / 26

Deadlocks

SELECT pg_sleep(0.500); SELECT * FROM lockview1;pid | vxid | lock_type | lock_mode | granted | xid_lock | relname

-------+-------+---------------+------------------+---------+----------+------------11306 | 2/61 | transactionid | ExclusiveLock | t | 710 |11306 | 2/61 | relation | RowExclusiveLock | t | | i_lockdemo11306 | 2/61 | relation | RowExclusiveLock | t | | lockdemo11306 | 2/61 | tuple | ExclusiveLock | t | | lockdemo11306 | 2/61 | transactionid | ShareLock | f | 711 |11642 | 3/116 | transactionid | ExclusiveLock | t | 711 |11642 | 3/116 | relation | RowExclusiveLock | t | | i_lockdemo11642 | 3/116 | relation | RowExclusiveLock | t | | lockdemo11642 | 3/116 | tuple | ExclusiveLock | t | | lockdemo11642 | 3/116 | transactionid | ShareLock | f | 710 |

Unlocking the Postgres Lock Manager

8 / 26

MVCC Behavior

INSERT

UPDATE

DELETE

old (delete)

new (insert)

ExpCre 40

ExpCre 40

47

ExpCre 64

78

ExpCre 78

UPDATE is effectively a DELETE and an INSERT.MVCC Unmasked

9 / 26

MVCC Examples

Visible

Invisible

Invisible

Create−Only

Create & Expire

Visible

Visible

Internally, the creation xid is stored in the system column ’xmin’, and expire in ’xmax’.

Invisible

ExpCre 30

ExpCre 50

CreExp

30110

ExpCre 30

75

ExpCre 30

80

ExpCre 110

For simplicity, assume all othertransactions are committed.

Open Transactions: 25, 50, 75

The highest−numberedcommitted transaction: 100

Snapshot

Sequential Scan

MVCC Unmasked

10 / 26

Heap Page Structure

Page Header Item Item Item

Tuple

Tuple Tuple Special

8K

PostgreSQL Internals Through Pictures

11 / 26

Pg_upgrade: Restore Schema In New Cluster

1

2

3 6

5

4 7

8

9

pg_class

User Tables and Indexes

15 21 27

262014

13 19 25

241812

17 2311

10 16 22

System Tables and Indexes

clog

Old Cluster

1

2

3 6

5

4 7

8

9

pg_class

User Tables and Indexes

26

System Tables and Indexes

clog

New Cluster

pg_dumpall − −schema

17 23

24

25

27

18

10

11

12

13

14

15

16

19

20

21

22

Rapid Upgrades With Pg_Upgrade

12 / 26

Pg_upgrade: Copy User Heap/Index Files

1

2

3 6

5

4 7

8

9

pg_class

User Tables and Indexes

15 21 27

262014

13 19 25

241812

17 2311

10 16 22

System Tables and Indexes

clog

Old Cluster

1

2

3 6

5

4 7

8

9

pg_class

User Tables and Indexes

26

System Tables and Indexes

clog

New Cluster

1610

11

13

12

14

15

19

20

21

22

23

24

25

27

18

17

Rapid Upgrades With Pg_Upgrade

13 / 26

Continuous Archiving

02:0

011

:00

09:0

013

:00

WAL

WALWAL

ContinuousFile System−

Level Backup Archive (WAL)

The Magic of Hot Streaming Replication

14 / 26

Point-in-Time Recovery

WAL

WAL

17:0

017

:30

17:4

017

:55

ContinuousFile System−

Level Backup Archive (WAL)

WAL

The Magic of Hot Streaming Replication

15 / 26

Streaming Replication Setup

02:0

011

:00

09:0

013

:00

WAL

WALWAL

File System−

Level Backup Server

Standby

The Magic of Hot Streaming Replication

16 / 26

Streaming Replication in Operation

archivecommand command

Primary Standby

Network

restore

/pg_wal/pg_wal

Archive

Directory

WAL

The Magic of Hot Streaming Replication

17 / 26

Read Scaling Using Pgpool & Streaming Replication

SELECTINSERT, UPDATE,

DELETE to master

host

Slave SlaveMasterreplication

to any host

replicationstreaming

��������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������

pgpool

A full copy of the data exists on every node.

PostgreSQL Replication Solutions 18 / 26

Write Scaling Using FDW-Based Sharding

������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������

�����������������������������������������������������������������������������������������������������������������������������������������������������������������

SQL Queries

PG FDW

Foreign Server Foreign Server Foreign Server

aggregates (10)joins (9.6)

sorts (9.6)aggs on partitions (11)

The Future of Postgres Sharding

19 / 26

Database Server Hardware Priorities

I/O

CPU

Memory

Database Hardware Selection Guidelines

20 / 26

Postgres’s Central Role

Extensions

NoSQL

Postgres

WarehouseData

Foreign DataWrappers

Window Functions

Data Paritioning

Bitmap Scans Sharding

Oracle

Twitter

MongoDB

Easy DDL

JSON

PL/R

ISN

PostGIS

Making Postgres Central in Your Data Center

21 / 26

Use of the Contains Operator @>

\do @>List of operators

Schema | Name | Left arg type | Right arg type | Result type | Description------------+------+---------------+----------------+-------------+-------------pg_catalog | @> | aclitem[] | aclitem | boolean | containspg_catalog | @> | anyarray | anyarray | boolean | containspg_catalog | @> | anyrange | anyelement | boolean | containspg_catalog | @> | anyrange | anyrange | boolean | containspg_catalog | @> | box | box | boolean | containspg_catalog | @> | box | point | boolean | containspg_catalog | @> | circle | circle | boolean | containspg_catalog | @> | circle | point | boolean | containspg_catalog | @> | jsonb | jsonb | boolean | containspg_catalog | @> | path | point | boolean | containspg_catalog | @> | polygon | point | boolean | containspg_catalog | @> | polygon | polygon | boolean | containspg_catalog | @> | tsquery | tsquery | boolean | contains

Non-Relational Postgres

22 / 26

Postgres System Tables

starelid

staattnum

staop

pg_statistic

oprleft

oprright

oprresult

oprcom

oprnegate

oprlsortop

oprrsortop

oprcode

oprrest

oprjoin

pg_operator

typrelid

typelem

typinput

typoutput

typbasetype

pg_type

prolang

prorettype

pg_proc

pg_rewrite

ev_class

datlastsysoid

pg_database

tgfoid

tgrelid

pg_trigger

inhrelid

pg_inherits

inhparent

pg_language

pg_namespacepg_depend pg_shadow

pg_aggregate

aggfinalfn

aggtransfn

aggfnoid

aggtranstype

castsource

casttarget

pg_cast

castfunc

pg_description

pg_constraint

contypid

pg_conversion

conproc

amopopr

amopclaid

pg_attribute

indexrelid

attnum

amopclaid

atttypid

indrelid

pg_attrdef

pg_group

adrelid

pg_index

adnum

pg_am

pg_amop

amgettuple

reltoastidxid

aminsert

reltoastrelid amcostestimate

amproc

ambeginscan

pg_amproc

amrescan

relfilenode

amendscan

relam

ammarkpos

reltype amrestrpos

pg_class

ambuild

opcdeftype

ambulkdelete

pg_opclass

attrelid

PostgreSQL Internals Through Pictures

http://www.postgresql.org/docs/current/catalogs.html23 / 26

CTEs: Mixing Modification Commands

CREATE TEMPORARY TABLE old_orders (order_id INTEGER);

WITH source (order_id) AS (DELETE FROM orders WHERE name = ’my order’ RETURNING order_id

), source2 AS (DELETE FROM items USING source WHERE source.order_id = items.order_id

)INSERT INTO old_orders SELECT order_id FROM source;

Programming the SQL Way with Common Table Expressions

24 / 26

SSL ’Verify-Ca’ Is SecureFrom Spoofing

server.crt

X

root.crt

Database

Fake PostgreSQL

Database

Server

PostgreSQLSSL verify-ca

Invalid certificate

Server(no CA signature)Client

Database

Securing PostgreSQL From External Attack

25 / 26

Conclusion: Release Dates and Sizes After 2000

version | reldate | months | relnotes | lines | change | % change----------+------------+--------+----------+---------+--------+----------7.0 | 2000-05-08 | 11 | | 383270 | 51992 | 157.1 | 2001-04-13 | 11 | | 410500 | 27230 | 77.2 | 2002-02-04 | 10 | 250 | 394274 | -16226 | -37.3 | 2002-11-27 | 10 | 305 | 453282 | 59008 | 147.4 | 2003-11-17 | 12 | 263 | 508523 | 55241 | 128.0 | 2005-01-19 | 14 | 230 | 654437 | 145914 | 288.1 | 2005-11-08 | 10 | 174 | 630422 | -24015 | -38.2 | 2006-12-05 | 13 | 215 | 684646 | 54224 | 88.3 | 2008-02-04 | 14 | 223 | 762697 | 78051 | 118.4 | 2009-07-01 | 17 | 314 | 939098 | 176401 | 239.0 | 2010-09-20 | 15 | 237 | 999862 | 60764 | 69.1 | 2011-09-12 | 12 | 203 | 1069547 | 69685 | 69.2 | 2012-09-10 | 12 | 238 | 1148192 | 78645 | 79.3 | 2013-09-09 | 12 | 177 | 1195627 | 47435 | 49.4 | 2014-12-18 | 15 | 211 | 1261024 | 65397 | 59.5 | 2016-01-07 | 13 | 193 | 1340005 | 78981 | 69.6 | 2016-09-29 | 8 | 214 | 1380458 | 40453 | 3

PostgreSQL: Past, Present, and Future

http://momjian.us/presentations 26 / 26

top related