the accidental dba

61
The Accidental DBA a guide for the perplexed Don't Panic Josh Berkus PostgreSQL Experts, Inc. LinuxCon NA 2012

Upload: postgresql-experts-inc

Post on 18-Dec-2014

801 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The Accidental DBA

The Accidental DBAa guide for

the perplexed

Don't PanicJosh Berkus

PostgreSQL Experts, Inc.LinuxCon NA 2012

Page 2: The Accidental DBA

“Jonathan quit. You're in charge of

the Postgres servers now.”

Page 3: The Accidental DBA

covered in this talk● PostgreSQL

● installation● configuration● connections● backup● monitoring● slow queries● migrations

● MySQL● (covered

yesterday)

● no time for● replication● indexes● schema design

Page 4: The Accidental DBA

9.2rc1 Out Now!

Page 5: The Accidental DBA

“DevOps”

“Cloud”

Page 6: The Accidental DBA

Y U no DBA?

1.limited budgets

2.shortage of operational staff

3.cheaper OSS databases

… you are the DBA now.

Page 7: The Accidental DBA

Oh My God,We’re All Going To Die.

Page 8: The Accidental DBA

Don't Panic

Page 9: The Accidental DBA
Page 10: The Accidental DBA
Page 11: The Accidental DBA

Installation

Don't Panic

Page 12: The Accidental DBA

Use Packages!● version not important?● use the ones that come with your

distro● Red Hat, Centos, SciLinux● Debian, Ubuntu● SuSE

Page 13: The Accidental DBA

Use Packages!● need the latest version?● alternate packages

● Red Hat: yum.postgresql.org● Ubuntu: Martin Pitt's backports● SuSE: build service● Debian: backports coming soon

Page 14: The Accidental DBA

create data directory?

● $PGDATA is where the database files live

● most packages create it● if not, use “initdb” to create it

● pick a suitable location!

Page 15: The Accidental DBA

configuration

Don't Panic

Page 16: The Accidental DBA

use good hardware● databases use all the hardware

● RAM, CPU, IO● disk can be very important

– DB larger than RAM– write-heavy database

● the database cannot outperform bad hardware

Page 17: The Accidental DBA

put the databaseon its own server(or virtual server)

Page 18: The Accidental DBA

cloud servers● cloud server performance sucks

● especially IO

● make sure you have enough RAM to cache the whole database

Page 19: The Accidental DBA

Linux configuration

1.turn the OOM killer off

2.set zone_reclaim_mode = 0

3.use XFS or Ext4 for database files

4.increase shmmax, shmall● so that you can raise shared_buffers

● this is going away with 9.3!

Page 20: The Accidental DBA

postgresql.conf

shared_buffers = ¼ of RAM up to 8GBwork_mem = ( RAM * 2 )/ max_connections

maintenance_work_mem = 1/16 RAMeffective_cache_size = 75% of RAMcheckpoint_segments = 32 (small DB) to 128 (large DB, many writes)

Page 21: The Accidental DBA

connections&

security

Don't Panic

Page 22: The Accidental DBA

network● local connections: UDP, if possible

● faster than TCP/IP

● other servers: port 5432● make sure it's open on the firewall!

● on the cloud? use SSL● secure your connections● PITA to set up, though

Page 23: The Accidental DBA

max_connections

● postgresql.conf● increase number of connections● good up to about 50 + 10 x cores● keep needing to increase it?

something wrong with the app

“ERROR: connection limit exceeded for non-superusers”

Page 24: The Accidental DBA

connection pooling● Java? use J2EE pooling● Everyone else: pgbouncer

● event-based pooler● separate package● on DB server, or● app server, or ● 3rd “bouncer” server

Page 25: The Accidental DBA

host-based access

● pg_hba.conf● access control list:

● database/user/host address● like iptables for Postgres

● change config and reload

“FATAL: no pg_hba.conf entry for host "192.168.0.1", user "chaos", database "chaosLRdb", SSL off”

Page 26: The Accidental DBA

security

● Postgres users & passwords● CREATE/ALTER USER● “group” ROLEs

● DB object permissions● Or: use LDAP or PAM

“FATAL: password authenticationfailed for user "wwwuser"”

Page 27: The Accidental DBA

the psqlcommand line

Page 28: The Accidental DBA

slow queries

Don't Panic

Page 29: The Accidental DBA

pg_stat_activity-[ RECORD 2 ]----+--------------------------------datid | 16422datname | libdataprocpid | 46295usesysid | 10usename | dataentryapplication_name | psqlclient_addr | 192.168.101.114client_port | 5432backend_start | 2012-08-26 15:09:05.233-07xact_start | 2012-08-26 15:09:06.113-07query_start | 2012-08-26 15:11:53.521-07waiting | fcurrent_query | <IDLE> in transaction

Page 30: The Accidental DBA

locks● write queries can block on other write

queries● as can table schema changes● queries can wait forever on locks

● look for “<IDLE> in transaction”● that's a ...

Page 31: The Accidental DBA

Zombie Transactions

Want RAAAAAAAM

Page 32: The Accidental DBA

killing zombies● pg_cancel_backend(pid)

● kills running queries with sigINT● like CTRL-C

● pg_terminate_backend(pid)● kills bad connections, idle

transactions● can cause DB to restart

Page 33: The Accidental DBA

EXPLAINNested Loop (cost=792.00..828.08 rows=1422317 width=99) -> HashAggregate (cost=792.00..792.00 rows=1 width=4) -> Index Scan using index_player_summaries_on_player_id on player_summaries ps (cost=0.00..791.80 rows=403 width=4) Index Cond: (player_id = 21432312) -> Index Scan using index_player_summaries_on_match_id on player_summaries (cost=0.00..33.98 rows=600 width=99) Index Cond: (match_id = ps.match_id)

Page 34: The Accidental DBA

EXPLAIN ANALYZENested Loop (cost=792.00..828.08 rows=1422317 width=99) (actual time=9928.869..20753.723 rows=13470 loops=1) -> HashAggregate (cost=792.00..792.00 rows=1 width=4) (actual time=9895.105..9897.096 rows=1347 loops=1) -> Index Scan using index_player_summaries_on_player_id on player_summaries ps (cost=0.00..791.80 rows=403 width=4) (actual time=27.413..9890.887 rows=1347 loops=1) Index Cond: (player_id = 21432312) -> Index Scan using index_player_summaries_on_match_id on player_summaries (cost=0.00..33.98 rows=600 width=99) (actual time=7.375..8.037 rows=10 loops=1347) Index Cond: (match_id = ps.match_id)Total runtime: 20764.371 ms"

Page 35: The Accidental DBA

explain.depesz.com

Page 36: The Accidental DBA

what to look for● “seq scan” on large table

● maybe index needed

● cartesian joins● really bad row estimates

● ANALYZE needed?

Page 37: The Accidental DBA

backups

Don't Panic

Page 38: The Accidental DBA

2012-01-27 18:00:44 MSK FATAL: invalid page header in block 311757 of relation

base/26976/279772012-01-27 18:00:44 MSK CONTEXT: xlog redo insert: rel 1663/26976/27977;

tid 311757/442012-01-27 18:00:44 MSK LOG: startup process (PID 392) exited with exit code 1

2012-01-27 18:00:44 MSK LOG: aborting startup due to startup process failure

Page 39: The Accidental DBA
Page 40: The Accidental DBA

pg_dump● “logical” backup

● portable● compressed● works for upgrades

● good for small databases● use -Fc

● custom binary format

Page 41: The Accidental DBA

PITR● “Point-In-Time Recovery”● “binary” and “continuous” backup

● take snapshot of DB files● accumulate logfile copies

● good for large databases● can combine with replication

Page 42: The Accidental DBA

PITR - PITA● can be difficult to set up & monitor● use tools:

● RepMgr● OmniPITR● WAL-E (for AWS)

Page 43: The Accidental DBA

monitoring

Don't Panic

Page 44: The Accidental DBA

use your favorite toolganglia, collectd, Hyperic, OpenNMS, OpenView, whatever .... ● nagios check_postgres.pl

● broad list of checks● mine it for queries and techniques

Page 45: The Accidental DBA

many useful checks● disk space● caching RAM● response time● connections● idle transacts● table growth● waiting queries

● long queries● database size● table bloat● system load● replication lag● XID wraparound● execution time

Page 46: The Accidental DBA

activity log● connections & disconnections● slow queries● DB swap usage● schema changes● lock waits & deadlocks

Page 47: The Accidental DBA

pgfouine, pgbadger

Page 48: The Accidental DBA

schema migrations

Don't Panic

Page 49: The Accidental DBA

SQL-DDL is code1.write migration scripts

● make them idempotent

● write “undo” scripts

2.check them into version control

3.test them on a staging server● check how long they take

4.deploy on production

Page 50: The Accidental DBA

Postgres doesn't require downtime for

most schema changes.

Page 51: The Accidental DBA

updates&

upgrades

Don't Panic

Page 52: The Accidental DBA

major vs. minor9.2 == a major version

● requires an upgrade from 9.1.4● contains features not in 9.1● requires testing and planned downtime

9.1.5 == a minor version● is a minor “update” from 9.1.4.● can (and should) be applied immediately

Page 53: The Accidental DBA

minor updates● come out ~ every 2 months● contain only bugfixes

● security hole patches● data loss prevention● fix server crashes

● no new or changed features● occasional documented breakage

Page 54: The Accidental DBA

update procedure1.schedule 5 minute downtime

2.download packages

3.shut down postgresql

4.install packages

5.restart postgresql

6.restart application

Page 55: The Accidental DBA

major upgrades● come out once per year● have many new features

● and sometimes break stuff which used to work

● require extensive testing with your application

● require significant downtime to upgrade

Page 56: The Accidental DBA

upgrade procedures● dump & reload

● use pg_dump & pg_restore on database● most reliable way● “cleans up” database in process● best with small databases● can take a long, long time

Page 57: The Accidental DBA

upgrade procedures● pgUpgrade

● upgrade “in place”● much faster● does not “clean up” database● sometimes doesn't work

Page 58: The Accidental DBA

EOL after5 years

Page 59: The Accidental DBA

Getting Help● docs

● postgresql.org/docs/● postgresguide.org● wiki.postgresql.org

● books● PostgreSQL Up and Running (O'Reilly)● PostgreSQL 9.0 High Performance (Packt)

Page 60: The Accidental DBA

Getting Help● mailing lists

● postgresql.org/community/lists– pgsql-novice, pgsql-general

● chat● irc.freenode.net, #postgresql

● web● dba.stackexchange.com● planet.postgresql.org

Page 61: The Accidental DBA

questions?● Josh Berkus

[email protected]● PGX: www.pgexperts.com● Blog: www.databasesoup.com

● Upcoming Events● Postgres Open: Chicago, Sept 17-19● LISA: San Diego, Dec 5-8

Copyright 2012 PostgreSQL Experts Inc. Released under the Creative Commons Attribution License. All images are the property of their respective owners. The Don't Panic slogan and logo is property of the BBC, and the Dilbert image belongs to Scott Adams and is used here as parody.