Transcript
Page 1: Bigger data with PostgreSQL 9

Slide 1

© by Numius nvOpen systems, Smarter people

Bigger data with PostgreSQL 9

Datawarehousing in the 21st century.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Page 2: Bigger data with PostgreSQL 9

Slide 2

© by Numius nvOpen systems, Smarter people

The presenter..

• Bert Desmet

• Consultant @ Deloitte

• System Engineer / DBA for deloitteanalytics.eu

• 'devop'?

Page 3: Bigger data with PostgreSQL 9

Slide 3

© by Numius nvOpen systems, Smarter people

agenda

• Introduction

• Release the elephants!

• Impacting factors

• Divide et impera

• Basic configuration

• Passing the speed limits

• Keep your database fit

Page 4: Bigger data with PostgreSQL 9

Slide 4

© by Numius nvOpen systems, Smarter people

Big data?

● 44x data growth per year!

● About 35.2 zettabyte by 2020

● 80% of data is unstructured

● The volume will grow by a whopping 650% in the next 5years

● 80% of organisations will use cloud analytics

● By 2014 80% of eneterprises will want a saas based bi system

Page 5: Bigger data with PostgreSQL 9

Slide 5

© by Numius nvOpen systems, Smarter people

Know your limits

● DB2

● More load

● Scaling

● Speed

● Data size

● Pricing

Page 6: Bigger data with PostgreSQL 9

Slide 6

© by Numius nvOpen systems, Smarter people6 Footer

Release the elephants!

Page 7: Bigger data with PostgreSQL 9

Slide 7

© by Numius nvOpen systems, Smarter people

PostgreSQL 9

● Good for big databases

● Easy maintenance

● Scales!

● Very fast

● Extendable

Page 8: Bigger data with PostgreSQL 9

Impacting factors

Page 9: Bigger data with PostgreSQL 9

Slide 9

© by Numius nvOpen systems, Smarter people

Higly impacting operations

• Dataload

• In bulk (ETL)

• Row by row. Up to 100k rows / minute

• Datafetch (Reporting)

• We do like joins. The more the better.

Page 10: Bigger data with PostgreSQL 9

Slide 10

© by Numius nvOpen systems, Smarter people

Extra problems

• a lot of I/O

• A lot of cpu power (index creation)

• A lot of locks

Page 11: Bigger data with PostgreSQL 9

Slide 11

© by Numius nvOpen systems, Smarter people

The solution?

• Use at least 2 servers

• Set up binary replication

• Put a lot of ram in your servers.

Page 12: Bigger data with PostgreSQL 9

Slide 12

© by Numius nvOpen systems, Smarter people

Dataflow

Page 13: Bigger data with PostgreSQL 9

Slide 13

© by Numius nvOpen systems, Smarter people13 Footer

Devide et Impera

Page 14: Bigger data with PostgreSQL 9

Slide 14

© by Numius nvOpen systems, Smarter people

Replication with postgres

• 8.3 Warm Standby

• 9.0 Async. Binary Replication

• 9.1 Synchronous Replication

• 9.2 Cascading Replication

• 9.3 more improvents towards fail overs / switching masters

• 9.4 Multimaster Binary Replication?

Page 15: Bigger data with PostgreSQL 9

Slide 15

© by Numius nvOpen systems, Smarter people

Configure replication

• Wal_level = ‘host standby’

• Checkpoint_segments >= 32

• Checkpoint_completetion_target >= 0.8

• Hot_standby = on

• Hot_standby_feedback = on

Page 16: Bigger data with PostgreSQL 9

Slide 16

© by Numius nvOpen systems, Smarter people

Page 17: Bigger data with PostgreSQL 9

Slide 17

© by Numius nvOpen systems, Smarter people

Keep it simple, stupid

• 2nd quadrant is pretty awesome

• Barman for backups

• Repmgr for replication management

Page 18: Bigger data with PostgreSQL 9

Slide 18

© by Numius nvOpen systems, Smarter people

Basic configuration

Page 19: Bigger data with PostgreSQL 9

Slide 19

© by Numius nvOpen systems, Smarter people

Raise those memory limits!

• shared_buffers = 1/8 to ¼ of RAM

• work_mem = 128MB to 1GB

• maintenance_work_mem = 512MB to 1GB

• temp_buffers = 128MB to 1GB

• effective_cache_size = ¾ of RAM

• wal_buffers = 32MB

Page 20: Bigger data with PostgreSQL 9

Slide 20

© by Numius nvOpen systems, Smarter people

Tune the planner for correct planning

• Random_page_cost = 3

• Cpu_tuple_cost = 0.1

• Contraint_exclusion=on

• From_collapse_limit => 12

• Join_collapse_limit => 12

Page 21: Bigger data with PostgreSQL 9

Slide 21

© by Numius nvOpen systems, Smarter people

Passing the speed limits

Page 22: Bigger data with PostgreSQL 9

Slide 22

© by Numius nvOpen systems, Smarter people

Use partitions

• Think about the partition key!

• Trigger based for row / row inserts

• Rule based for bulk inserts

• Make sure you add constraints

Page 23: Bigger data with PostgreSQL 9

Slide 23

© by Numius nvOpen systems, Smarter people

Use indexes

• Learn to read query explains

• Use http://explain.depesz.com/

• Don’t over index

Page 24: Bigger data with PostgreSQL 9

Slide 24

© by Numius nvOpen systems, Smarter people

Other sane things to do

• Use unique indexes

• Auto created when defining a primary key

• Use clustered indexes

• And cluster those tables regularly

Page 25: Bigger data with PostgreSQL 9

Slide 25

© by Numius nvOpen systems, Smarter people

Use partial indexes

• Can only be found in Postgres and Mysql.

• Really usefull on big tables

• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.

Page 26: Bigger data with PostgreSQL 9

Keep your database fit

Page 27: Bigger data with PostgreSQL 9

Slide 27

© by Numius nvOpen systems, Smarter people

Vacuum

• Disable autovacuum for datawarehouses

• Vacuum once a day

• Check regulary if the vacuums to run!

• Prevents data loss

• Prevents the database to go out of control, size wise

Page 28: Bigger data with PostgreSQL 9

Slide 28

© by Numius nvOpen systems, Smarter people

Analyze

• Analyze once a day

• Together with vacuum

• Vacuum analyze <schema>.<table>;

• ‘default_statistics_target’ >= 300

Page 29: Bigger data with PostgreSQL 9

Slide 29

© by Numius nvOpen systems, Smarter people

Check for bloat!

• Free space on tables.

• Indexes are not optimized anymore

• use nagios check_postgres.pl

Page 30: Bigger data with PostgreSQL 9

Slide 30

© by Numius nvOpen systems, Smarter people

Prevent bloat

• Vacuum full • Offline!

• Only when a pk is not available

• Repack• Online!

• Orders the tables (clustered index)

• Needs a pk on the table

• Reindex

• Reindex regulary.

Page 31: Bigger data with PostgreSQL 9

Slide 31

© by Numius nvOpen systems, Smarter people

Partial indexes?

• Write a script

• Use a cronjob

• Recreate your time-aware indexes every day. Will be fast.

Page 32: Bigger data with PostgreSQL 9

Slide 32

© by Numius nvOpen systems, Smarter people

Page 33: Bigger data with PostgreSQL 9

Slide 33

© by Numius nvOpen systems, Smarter people

Questions?

• Postgres has an awesome community ®

• Irc: #postgresql @ freenode

• Check the mailing list

Page 34: Bigger data with PostgreSQL 9

Slide 34

© by Numius nvOpen systems, Smarter people


Top Related