when it all goes wrong - postgresql€¦ · 2019-10-17 when postgres goes wrong pgconfeu created...

Post on 21-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

When it all Goes Wrong

@leinweber

Will Leinweber@leinweber

Citus Data (Microsoft)

bitfission.com(warning autoplays midi)

@leinweber

coming fromcitus cloud

heroku postgres

@leinweber

special thankscitus cloud— dan farina (@danfarina)

heroku postgres— maciek sakrejda (@uhoh_itsmaciek)

@leinweber

same sorts of problemsfrom pages & alerts

from support tickets

@leinweber

this talkmore app dev who uses postgresrather than dba

@leinweber

the problem with Postgresit’s pretty good

you don’t get experience with how it breaks

@leinweber

what to do for a problem

@leinweber

what to do for a problem

@leinweber

complicated systemnetwork

hardware

o/s

postgres

@leinweber

using the database (too much)95% application

4% auto vacuum

1% everything else

@leinweber

hard to convinceall the graphs saying DB is slow

and nothing has changed

…must be the database!

@leinweber

https://upload.wikimedia.org/wikipedia/commons/9/98/Survivorship-bias.png

@leinweber

“but I didn’t change anything”no deploys!

no database migrations!

no scaling!

@leinweber

“but I didn’t change anything”

https://upload.wikimedia.org/wikipedia/commons/0/09/Redherring.gif

@leinweber

“but I didn’t change anything”more traffic?

change in access patterns?

one big user logged in?

@leinweber

run out of a resource

@leinweber

snowball

@leinweber

examplemanageable user 1s query => 2x expensive

frequent, small queries 3ms => 12ms

@leinweber

assumptions

hardwaremaintenance

app

@leinweber

assumptionspostgres should not crash

…with overcommit off and no containers

large extensions increase chance

@leinweber

if not postgres, then what

@leinweber

system resourcescpu

memory

disk

parallelism / backends

locks

@leinweber

cpu mem disk parallelismcpu mem disk parallelism

@leinweber

cpu mem disk parallelismcredentials wrong

networking broken

locking issue, check pg_locks

idle in transaction

@leinweber

cpu mem disk parallelismapplication submitting backlogged workload

connection leak

pool sizes set too large

pg_lock issue + application backlog

@leinweber

cpu mem disk parallelismworkload skew causing thrashing

unusual sequential scan workload

failover or restart => no cache

pg_prewarm

@leinweber

cpu mem disk parallelismsame as just disk,

but also the application is piling on

@leinweber

cpu mem disk parallelismlarge GROUP BYs

high disk latency due to unusual page dispersion pattern in the workload

@leinweber

cpu mem disk parallelismworkload has high mem (GROUP BY) + app adding backlog

lock contention slowing mem release

@leinweber

cpu mem disk parallelismlarge GROUP BYs + paging in unusual data

@leinweber

cpu mem disk parallelismLook for what is causing disk access

@leinweber

cpu mem disk parallelismsmall, in-memory workload

lots of seq scans on small table

index scan w/ filter dropping lots

@leinweber

cpu mem disk parallelismapp backlog + too much processing on small data

simply a lot of work

@leinweber

cpu mem disk parallelismlarge seq scans

@leinweber

cpu mem disk parallelismloading cold data + application backlog

@leinweber

cpu mem disk parallelismsmall # of backends doing a lot more work

@leinweber

cpu mem disk parallelismentity, workload, entity*workload

soft deletes and non-conditional indexes

@leinweber

cpu mem disk parallelismreporting query

@leinweber

cpu mem disk parallelismapp backlog, but with CPU/mem problems

@leinweber

tools of the trade

@leinweber

tools of the tradeC symbols

@leinweber

tools of the trade: perfperf record -p <pid> && perf report

@leinweber

tools of the trade: perfperf top

@leinweber

tools of the trade: perfwww.brendangregg.com/perf.html

@leinweber

tools of the trade: gdbgdb -batch -ex 'bt' -p <pid>

@leinweber

@leinweber

@leinweber

tools of the trade: iostatiostat -xm 10

@leinweber

tools of the trade: iotop

@leinweber

tools of the trade: htop

@leinweber

Tools of the trade: bwm-ng

@leinweber

tools of the trade: backendspgrep -lf postgres + grep + wc

select * from pg_stat_activity

@leinweber

tools of the trade: pg_s_sselect * from pg_stat_statements

@leinweber

tools of the trade: summarycpu mem disk parallelism network

perf x

gdb x

iostat x

iotop x

htop x x

bwm x

pgrep x

@leinweber

what to do

@leinweber

what to doconfiguration change

@leinweber

what to dodb change

@leinweber

what to docode change

@leinweber

flirting with disasterVelocity NY 2013: Richard Cook"Resilience In Complex Adaptive Systems”

Jens Rasmussen:Risk management in a dynamic society: a modeling problem

@leinweber

flirting with disaster

economicboundary

@leinweber

flirting with disaster

economicboundary

workloadboundary

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

errormargin

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

errormargin

@leinweber

flirting with disaster

economicboundary

workloadboundary

performanceboundary

errormargin

@leinweber

flirting with disasterVelocity NY 2013: Richard Cook"Resilience In Complex Adaptive Systems”

Jens Rasmussen: Risk management in a dynamic society: a modeling problem

@leinweber

thank youWill Leinweber

@leinweber citusdata.com

top related