pg10 what-to-look-for-amitlan-2

21
Copyright©2017 NTT Corp. All Rights Reserved. PostgreSQL 10: What to Look For? Amit Langote, NTT OSS Center

Upload: amit-langote

Post on 22-Jan-2018

493 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Pg10 what-to-look-for-amitlan-2

Copyright©2017 NTT Corp. All Rights Reserved.

PostgreSQL 10: What to Look For?

Amit Langote, NTT OSS Center

Page 2: Pg10 what-to-look-for-amitlan-2

1Copyright©2017 NTT Corp. All Rights Reserved.

PostgreSQL 10

• Why 10.0?• Why not 9.7.0 or 10.0.0?

• Avoid discussing each year about whether to up the major version number

• Instead of two-part major version number, adopt single number and increase it every year

• So, 10.0 in 2017 and then 11.0 in 2018 and so on

• What features is this presentation about?• Features that are likely to be in PostgreSQL 10.0

• Most are still under active development and testing [0]

• Details are still fuzzy

• Are there demos?• No, sorry.

Page 3: Pg10 what-to-look-for-amitlan-2

2Copyright©2017 NTT Corp. All Rights Reserved.

• Replication

• Partitioning

• FDW

• Parallel execution

• Write amplification mitigation

• Executor overhaul

• New statistics for query optimization

• And few others• PITR configuration overhaul

• Replication and backup configuration and defaults

Features

Page 4: Pg10 what-to-look-for-amitlan-2

3Copyright©2017 NTT Corp. All Rights Reserved.

Replication

• Logical replication• With the existing physical replication, it’s not possible to replicate changes of

specific tables

• Logical replication makes it possible to do that

• Changes are applied to the replica in the same order as they happened on the original server, that is, in a transactionally consistent manner

• Possible to replicate to a different database (that is, not just to another PostgreSQL server of the same major version)

Page 5: Pg10 what-to-look-for-amitlan-2

4Copyright©2017 NTT Corp. All Rights Reserved.

• Quorum-based synchronous replication• Quorum: A quorum is the minimum number of votes that a distributed

transaction has to obtain in order to be allowed to perform an operation in a distributed system. [Wikipedia]

• For example, synchronous commit with durability of multiple replicas

• Two ways to specify the quorum (K) for synchronous replicas:

[ FIRST ] K (replica1, replica2, …, replicaN)

ANY K (replica1, replica2, …, replicaN)

Replication

Page 6: Pg10 what-to-look-for-amitlan-2

5Copyright©2017 NTT Corp. All Rights Reserved.

• Partitioning• The art of dividing a large table into chunks of conveniently sized pieces called

partitions, where each piece holds a subset of the data

• In older PostgreSQL versions, this process involved number of steps and the system (especially the optimizer) has limited intelligence about the partitioning structure of a table, thus offering limited performance and scalability

• The new syntax (commands) in 10• Supports specifying Range or List of values for every partition

• Commands for partition ROLL-IN (ATTACH PARTITION) and ROLL-OUT (DETACH PARTITION)

• Multiple levels of partitioning allowed: divide partitions into their own pieces using the same method (that is, using same commands)

Partitioning

Page 7: Pg10 what-to-look-for-amitlan-2

6Copyright©2017 NTT Corp. All Rights Reserved.

Partitioning

• Optimizations for partitioned tables• Faster partition-pruning (pruning: do not scan the partitions that do not match

the query)

• Partition-wise join

• Partition-wise aggregation

Page 8: Pg10 what-to-look-for-amitlan-2

7Copyright©2017 NTT Corp. All Rights Reserved.

• Foreign Data Wrapper• Foreign Data: data (relational or whatever) not stored on the database to which

application is connected

• A Foreign Data Wrapper (FDW) driver allows PostgreSQL to fetch foreign data

• Of course, foreign data includes other PostgreSQL databases as well

Foreign Data Wrappers (FDW)

Page 9: Pg10 what-to-look-for-amitlan-2

8Copyright©2017 NTT Corp. All Rights Reserved.

• What is so special about FDWs?• PostgreSQL (optimizer) allows a FDW driver to push certain computations to the

foreign database to reduce network data traffic or to allow some computations to be performed faster

• Examples of such computations - selection, projection, join, aggregation, sort (if the foreign database is a relational RDBMS, such as when using postgres_fdw)

• Earliest postgres_fdw could not push many of those things, but the latest version (9.6) can almost all of them

• In 10, we can push even more• Aggregation

• More types of join (there are many types of joins in SQL with various semantics)

• UPDATEs that required to join to another table at the foreign site

Foreign Data Wrappers (FDW)

Page 10: Pg10 what-to-look-for-amitlan-2

9Copyright©2017 NTT Corp. All Rights Reserved.

• Atomic transactions• Must ensure atomicity (A of ACID) in transactions that access multiple sites

• Use two-phase commit for atomicity

• Asynchronous fetch• When fetching from multiple foreign sites, requests to different sites can be sent

asynchronously

Foreign Data Wrappers (FDW)

Page 11: Pg10 what-to-look-for-amitlan-2

10Copyright©2017 NTT Corp. All Rights Reserved.

• PostgreSQL 9.6 introduced limited parallel query support

• Robert Haas’ TPC-H parallelism report at PGCon’16 [2] shows how much can be done with 9.6

Parallel Execution

Parallel query support TPC-H query

No merge join Q2, Q13, Q15

No parallel hashing Q3, Q5, Q7, Q8, Q21

No parallel bitmap index scan Q4, Q5, Q6, Q7, Q14, Q15, Q20

No parallel subquery handling Q2, Q15, Q16, Q22

No parallel merge Q17

Don’t know what’s going on Q3, Q9, Q10, Q11, Q18

Looks good Q1, Q12, Q19

Page 12: Pg10 what-to-look-for-amitlan-2

11Copyright©2017 NTT Corp. All Rights Reserved.

• PostgreSQL 10.0 promises to cover enough ground• Parallel Index Scan – ✔

• Parallel Bitmap Index Scan – ✔

• Parallel Merge Join - ✔

• Parallel Hash Join – ✔

• Parallel subquery handing – ✔

• Bonus: Parallel Append (for parallel scan of partitions)

Parallel Execution

Page 13: Pg10 what-to-look-for-amitlan-2

12Copyright©2017 NTT Corp. All Rights Reserved.

Write Amplification Mitigation

• Uber left PostgreSQL in 2013 [3], because the architecture of Postgres caused higher write volume in their application

• PostgreSQL folks take such complaints seriously, so some developers quickly came up with solutions to mitigate the problem(s)

• WARM: Write Amplification Reduction Method• HOT: Heap-Only Tuples (avoids redundant index entries)

• A HOT update is one in which no indexed column changes; too strict a requirement

• WARM relaxes that requirement

• WARM: Less HOT! (will avoid still more redundant index entries)

Page 14: Pg10 what-to-look-for-amitlan-2

13Copyright©2017 NTT Corp. All Rights Reserved.

Write Amplification Mitigation

Non-HOT update HOT update(Tuple versions 2 and 3 are Heap-Only Tuples)

Page 15: Pg10 what-to-look-for-amitlan-2

14Copyright©2017 NTT Corp. All Rights Reserved.

• Indirect Indexes• Indexes that point into the primary key index, not heap

• Updates are faster: no new entries in unchanged indexes

• More HOT utilization

• Trade-off: read slower because two indexes to reach one heap tuple

Write Amplification Mitigation

Page 16: Pg10 what-to-look-for-amitlan-2

15Copyright©2017 NTT Corp. All Rights Reserved.

• WARM vs. Indirect Indexes• WARM is automatic and indirect indexes are not

Write Amplification Mitigation

Page 17: Pg10 what-to-look-for-amitlan-2

16Copyright©2017 NTT Corp. All Rights Reserved.

• PostgreSQL’s performance for OLAP workloads is not as great as it could be, mainly because the executor is inefficient

• Using the traditional iterator model causes a lot of inefficiencies on the modern micro-architectures

• Moreover, It fails to utilize advances like asynchronous, batched, vectorized/SIMD processing

Executor overhaul

Page 18: Pg10 what-to-look-for-amitlan-2

17Copyright©2017 NTT Corp. All Rights Reserved.

Executor overhaul

• More efficient processing of disk tuples• Currently, lots of cache misses and unpredictable branches

• More efficient processing of query operators and expressions• Recursion is expensive

• Indirect function calls are expensive

• Due to above, lots of CPU cache and pipeline inefficiencies

• JIT-compilation• Generating native code for query plan on-the-fly

• Optimizer stuff is hard though, that is, answering the question: which queries really need JIT?

• Using LLVM: Build PostgreSQL using the --with-llvm configure switch

Page 19: Pg10 what-to-look-for-amitlan-2

18Copyright©2017 NTT Corp. All Rights Reserved.

New statistics for query optimization

• Better statistics for optimizing queries with correlated variables• Optimizer uses a fixed set of heuristics when making plans based on the statistics

• Also, the ANALYZE command will collect only a fixed set of statistics

• Users are annoyed when the optimizer does not consider correlation between variables/columns, sometimes resulting in poor plan choices

• CREATE STATISTICS: specify statistics to collect

• One or combination of the following types:

• Dependencies (Functional dependencies)

• MCVs (Most Common Values)

• Histogram

• Modify the optimizer to consider the new statistics to make better plans

Page 20: Pg10 what-to-look-for-amitlan-2

19Copyright©2017 NTT Corp. All Rights Reserved.

Others

• PITR configuration overhaul• Merge recovery.conf into postgresql.conf

• Replication and backup configuration and defaults• Change defaults so taking backups and creating replicas is easier

Page 21: Pg10 what-to-look-for-amitlan-2

20Copyright©2017 NTT Corp. All Rights Reserved.

References

[0] PostgreSQL commit-fest app: https://commitfest.postgresql.org/

[1] Future In-Core Replication for PostgreSQL (2012)https://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf

[2] Parallel Query Has Arrived!

https://www.pgcon.org/2016/schedule/events/913.en.html

[3] Why Uber Engineering Switched From Postgres TO MySQL

https://eng.uber.com/mysql-migration/

http://rhaas.blogspot.jp/2016/08/ubers-move-away-from-postgresql.html

http://use-the-index-luke.com/blog/2016-07-29/on-ubers-choice-of-databases

https://postgr.es/m/flat/579795DF.10502%40commandprompt.com