big bad postgresql: bi on a budget

33
Big Bad PostgreSQL: A Case Study 1 Moving a large,” complicated,” and mission-critical datawarehouse from Oracle to PostgreSQL for cost control. Monday, August 2, 2010

Upload: joshua-l-davis

Post on 01-Dec-2014

1.805 views

Category:

Technology


3 download

DESCRIPTION

Big Bad PostgreSQL: BI on a Budget Theo Schlossnagle, OmniTI

TRANSCRIPT

Page 1: Big Bad PostgreSQL: BI on a Budget

Big Bad PostgreSQL: A Case Study

1

Moving a“large,”

“complicated,” andmission-criticaldatawarehouse

from Oracleto PostgreSQL

for cost control.

Monday, August 2, 2010

Page 2: Big Bad PostgreSQL: BI on a Budget

Who am I? @postwait on twitter

Author of “Scalable Internet Architectures”Pearson, ISBN: 067232699X

CEO of OmniTIWe build scalable and secure web applications

I am an EngineerA practitioner of academic computing.IEEE member and Senior ACM member.On the Editorial Board of ACM’s Queue magazine.

I work on/with a lot of Open Source software:Apache, perl, Linux, Solaris, PostgreSQL,Varnish, Spread, Reconnoiter, etc.

I have experience.I’ve had the unique opportunity to watch a great many catastrophes.I enjoy immersing myself in the pathology of architecture failures.

Monday, August 2, 2010

Page 3: Big Bad PostgreSQL: BI on a Budget

Datawarehouse

Log Importer

Data Exporter

OLTP warm backup

OLTP

Oracle 8i

0.5 TB

Hitachi

0.25 TB

JBOD

Oracle 8i

0.75 TB

JBOD

MySQL 4.1

1.2 TB

IDE RAID

MySQL

log importer

1.2 TB

SATA

RAID

Oracle 8i

0.5 TB

Hitachi

1.5 TB

MTI

Overall Architecture

OLTP instance:drives the site

Warm spare

bulk selects / data exports

Log import andprocessing

Monday, August 2, 2010

Page 4: Big Bad PostgreSQL: BI on a Budget

Database Situation

• The problems:

• The database is growing.

• The OLTP and ODS/warehouse are too slow.

• A lot of application code against the OLTP system.

• Minimal application code against the ODS system.

• Oracle:

• Licensed per processor.

• Really, really, really expensive on a large scale.

• PostgreSQL:

• No licensing costs.

• Good support for complex queries.

Monday, August 2, 2010

Page 5: Big Bad PostgreSQL: BI on a Budget

Database Choices

• Must keep Oracle on OLTP• Complex, Oracle-specific web application.• Need more processors.• ODS: Oracle not required.• Complex queries from limited sources.• Needs more space and power.• Result:• Move ODS Oracle licenses to OLTP• Run PostgreSQL on ODS

Monday, August 2, 2010

Page 6: Big Bad PostgreSQL: BI on a Budget

PostgreSQL gotchas

• For an OLTP system that does thousands of updates per second, vacuuming is a hassle.

• No upgrades?!

• pg_migrator is coming along.

• Less community experience with large databases.

• Replication features less evolved

• though PostgreSQL 9.0 ups the ante

Monday, August 2, 2010

Page 7: Big Bad PostgreSQL: BI on a Budget

PostgreSQL ♥ ODS

• Mostly inserts.

• Updates/Deletes controlled, not real-time.

• pl/perl (leverage DBI/DBD for remote database connectivity).

• Monster queries.

• Extensible.

Monday, August 2, 2010

Page 8: Big Bad PostgreSQL: BI on a Budget

Choosing Linux

• Popular, liked, good community support.

• Chronic problems:

• kernel panics

• filesystems remounting read-only

• filesystems don’t support snapshots

• LVM is clunky on enterprise storage

• 20 outages in 4 months

Monday, August 2, 2010

Page 9: Big Bad PostgreSQL: BI on a Budget

Choosing Solaris 10

• Switched to Solaris 10

• No crashes, better system-level tools.

• prstat, iostat, vmstat, smf, fault-management.

• ZFS

• snapshots (persistent), BLI backups.

• Excellent support for enterprise storage.

• DTrace.

• Free (too).

Monday, August 2, 2010

Page 10: Big Bad PostgreSQL: BI on a Budget

Oracle features we need

• Partitioning

• Statistics and Aggregations

• rank over partition, lead, lag, etc.

• Large selects (100GB)

• Autonomous transactions

• Replication from Oracle (to Oracle)

Monday, August 2, 2010

Page 11: Big Bad PostgreSQL: BI on a Budget

Partitioning

•Next biggest tables: billions

• Allows us to cluster data over specific ranges (by date in our case)

• Simple, cheap archiving and removal of data.

• Can put ranges used less often in different tablespaces (slower, cheaper storage)

pgods=# select count(1) from ods.ods_tblpick_super; count ------------ 3247286017(1 row)

For large data sets:

Monday, August 2, 2010

Page 12: Big Bad PostgreSQL: BI on a Budget

Partitioning PostgreSQL style

• PostgreSQL doesn’t support partition...

• It supports inheritance... (what’s this?)

• some crazy object-relation paradigm.

• We can use it to implement partitioning:

• One master table with no rows.

• Child tables that have our partition constraints.

• Rules on the master table for insert/update/delete.

Monday, August 2, 2010

Page 13: Big Bad PostgreSQL: BI on a Budget

Partitioning PostgreSQL realized

• Cheaply add new empty partitions

• Cheaply remove old partitions

• Migrate less-often-accessed partitions to slower storage

• Different indexes strategies per partition

• PostgreSQL >8.1 supports constraint checking on inherited tables.

• smarter planning

• smarter executing

Monday, August 2, 2010

Page 14: Big Bad PostgreSQL: BI on a Budget

RANK OVER PARTITION

• In Oracle:

• In PostgreSQL:

select userid, email from (! ! select u.userid, u.email,! ! row_number() over (partition by u.email order by userid desc) as position! ! from (...)) where position = 1

FOR v_row IN select u.userid, u.email from (...) order by email, userid descLOOP! IF v_row.email != v_last_email THEN! ! RETURN NEXT v_row;! ! v_last_email := v_row.email;! ! v_rownum := v_rownum + 1;! END IF;END LOOP;

With 8.4, we have windowing functions

Monday, August 2, 2010

Page 15: Big Bad PostgreSQL: BI on a Budget

Large SELECTs

• The width of these rows is about 2k

• 50 million row return set

• > 100 GB of data

select u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

• Application code does:

Monday, August 2, 2010

Page 16: Big Bad PostgreSQL: BI on a Budget

The Large SELECT Problem

• libpq will buffer the entire result in memory.

• This affects language bindings (DBD::Pg).

• This is an utterly deficient default behavior.

• This can be avoided by using cursors

• Requires the app to be PostgreSQL specific.

• You open a cursor.

• Then FETCH the row count you desire.

Monday, August 2, 2010

Page 17: Big Bad PostgreSQL: BI on a Budget

Big SELECTs the Postgres way

DECLARE CURSOR bigdump FORselect u.*, b.browser, m.lastmess from ods.ods_users u, ods.ods_browsers b, ( select userid, min(senddate) as senddate from ods.ods_maillog group by userid ) m, ods.ods_maillog l where u.userid = b.userid and u.userid = m.userid and u.userid = l.userid and l.senddate = m.senddate;

FETCH FORWARD 10000 FROM bigdump;

The previous “big” query becomes:

Then, in a loop:

Monday, August 2, 2010

Page 18: Big Bad PostgreSQL: BI on a Budget

Autonomous Transactions

• In Oracle we have over 2000 custom stored procedures.

•During these procedures, we like to:

• COMMIT incrementallyUseful for long transactions (update/delete) that need not be atomic -- incremental COMMITs.

• start a new top-level txn that can COMMITUseful for logging progress in a stored procedure so that you know how far you progessed and how long each step took even if it rolls back.

Monday, August 2, 2010

Page 19: Big Bad PostgreSQL: BI on a Budget

PostgreSQL shortcoming

• PostgreSQL simply does not support Autonomous transactions and to quote core developers “that would be hard.”

• When in doubt, use brute force.

• Use pl/perl to use DBD::Pg to connect to ourselves (a new backend) and execute a new top-level transaction.

Monday, August 2, 2010

Page 20: Big Bad PostgreSQL: BI on a Budget

Replication

• Cross vendor database replication isn’t too difficult.

•Helps a lot when you can do it inside the database.

•Using dbi-link (based on pl/perl and DBI) we can.

•We can connect to any remote database.

• INSERT into local tables directly from remote SELECT statements.[snapshots]

• LOOP over remote SELECT statements and process them row-by-row.[replaying remote DML logs]

Monday, August 2, 2010

Page 21: Big Bad PostgreSQL: BI on a Budget

Replication (really)

• Through a combination of snapshotting and DML replay we:

• replicate over into over 2000 tables in PostgreSQL from Oracle

• snapshot replication of 200

• DML replay logs for 1800

• PostgreSQL to Oracle is a bit harder

• out-of-band export and imports

Monday, August 2, 2010

Page 22: Big Bad PostgreSQL: BI on a Budget

New Architecture

• Master: Sun v890 and Hitachi AMS + warm standby running Oracle (1TB)

• Logs: several customs running MySQL instances (2TB each)

• ODS BI: 2x Sun v40 running PostgreSQL 8.3 (6TB on Sun JBODs on ZFS each)

• ODS archive: 2x custom running PostgreSQL 8.3 (14TB internal storage on ZFS each)

Monday, August 2, 2010

Page 23: Big Bad PostgreSQL: BI on a Budget

PostgreSQL is Lacking

• pg_dump is too intrusive.

• Poor system-level instrumentation.

• Poor methods to determine specific contention.

• It relies on the operating system’s filesystem cache.(which make PostgreSQL inconsistent across it’s supported OS base)

Monday, August 2, 2010

Page 24: Big Bad PostgreSQL: BI on a Budget

Enter Solaris

• Solaris is a UNIX from Sun Microsystems.

• Is it different than other UNIX/UNIX-like systems?

• Mostly it isn’t different (hence the term UNIX)

• It does have extremely strong ABI backward compatibility.

• It’s stable and works well on large machines.

• Solaris 10 shakes things up a bit:

• DTrace

• ZFS

• Zones

Monday, August 2, 2010

Page 25: Big Bad PostgreSQL: BI on a Budget

Solaris / ZFS

• ZFS: Zettaback Filesystem.

• 264 snapshots, 248 files/directory, 264 bytes/filesystem,278 (256 ZiB) bytes in a pool, 264 devices/pool, 264 pools/system

• Extremely cheap differential backups.

• I have a 5 TB database, I need a backup!

• No rollback in your database? What is this? MySQL?

• No rollback in your filesystem?

• ZFS has snapshots, rollback, clone and promote.

• OMG! Life altering features.

• Caveat: ZFS is slower than alternatives, by about 10% with tuning.

Monday, August 2, 2010

Page 26: Big Bad PostgreSQL: BI on a Budget

Solaris / Zones

• Zones: Virtual Environments.

• Shared kernel.

• Can share filesystems.

• Segregated processes and privileges.

• No big deal for databases, right?

But Wait!

Monday, August 2, 2010

Page 27: Big Bad PostgreSQL: BI on a Budget

Solaris / ZFS + Zones = Magic Juju

• ZFS snapshot, clone, delegate to zone, boot and run.

• When done, halt zone, destroy clone.

• We get a point-in-time copy of our PostgreSQL database:

• read-write,

• low disk-space requirements,

• NO LOCKS! Welcome back pg_dump,you don’t suck (as much) anymore.

• Fast snapshot to usable copy time:

• On our 20 GB database: 1 minute.

• On our 1.2 TB database: 2 minutes.

https://labs.omniti.com/trac/pgsoltools/browser/trunk/pitr_clone/clonedb_startclone.sh

Monday, August 2, 2010

Page 28: Big Bad PostgreSQL: BI on a Budget

ZFS: how I saved my soul.

• Database crash. Bad. 1.2 TB of data... busted.The reason Robert Treat looks a bit older than he should.

• xlogs corrupted. catalog indexes corrupted.

• Fault? PostgreSQL bug? Bad memory? Who knows?

• Trial & error on a 1.2 TB data set is a cruel experience.

• In real-life, most recovery actions are destructive actions.

• PostgreSQL is no different.

• Rollback to last checkpoint (ZFS), hack postgres code, try, fail, repeat.

Monday, August 2, 2010

Page 29: Big Bad PostgreSQL: BI on a Budget

Let DTrace open your eyes

• DTrace: Dynamic Tracing

• Dynamically instrument “stuff” in the system:

• system calls (like strace/truss/ktrace).

• process/scheduler activity (on/off cpu, semaphores, conditions).

• see signals sent and received.

• trace kernel functions, networking.

• watch I/O down to the disk.

• user-space processes, each function... each machine instruction!

• Add probes into apps where it makes sense to you.

Monday, August 2, 2010

Page 30: Big Bad PostgreSQL: BI on a Budget

Can you see what I see?

• There is EXPLAIN... when that isn’t enough...

• There is EXPLAIN ANALYZE... when that isn’t enough.

• There is DTrace.

; dtrace -q -n ‘postgresql*:::statement-start{ self->query = copyinstr(arg0); self->ok=1;}io:::start/self->ok/{ @[self->query, args[0]->b_flags & B_READ ? "read" : "write", args[1]->dev_statname] = sum(args[0]->b_bcount);}’dtrace: description 'postgres*:::statement-start' matched 14 probes^C

select count(1) from c2w_ods.tblusers where zipcode between 10000 and 11000; read sd1 16384select division, sum(amount), avg(amount) from ods.billings where txn_timestamp between ‘2006-01-01 00:00:00’ and ‘2006-04-01 00:00:00’ group by division; read sd2 71647232

Monday, August 2, 2010

Page 31: Big Bad PostgreSQL: BI on a Budget

OmniTI Labs / pgtreats

• https://labs.omniti.com/labs/pgtreats

• Where we stick out PostgreSQL goodies...

• like pg_file_stress

FILENAME/DBOBJECT READS WRITES # min avg max # min avg maxalldata1__idx_remove_domain_external 1 12 12 12 398 0 0 0slowdata1__pg_rewrite 1 12 12 12 0 0 0 0slowdata1__pg_class_oid_index 1 0 0 0 0 0 0 0slowdata1__pg_attribute 2 0 0 0 0 0 0 0alldata1__mv_users 0 0 0 0 4 0 0 0slowdata1__pg_statistic 1 0 0 0 0 0 0 0slowdata1__pg_index 1 0 0 0 0 0 0 0slowdata1__pg_index_indexrelid_index 1 0 0 0 0 0 0 0alldata1__remove_domain_external 0 0 0 0 502 0 0 0alldata1__promo_15_tb_full_2 19 0 0 0 11 0 0 0slowdata1__pg_class_relname_nsp_index 2 0 0 0 0 0 0 0alldata1__promo_177intaoltest_tb 0 0 0 0 1053 0 0 0slowdata1__pg_attribute_relid_attnum_index 2 0 0 0 0 0 0 0alldata1__promo_15_tb_full_2_pk 2 0 0 0 0 0 0 0alldata1__all_mailable_2 1403 0 0 423 0 0 0 0alldata1__mv_users_pkey 0 0 0 0 4 0 0 0

Monday, August 2, 2010

Page 32: Big Bad PostgreSQL: BI on a Budget

Results

• Move ODS Oracle licenses to OLTP

• Run PostgreSQL on ODS

• Save $800k in license costs.

• Spend $100k in labor costs.

• Learn a lot.

Monday, August 2, 2010

Page 33: Big Bad PostgreSQL: BI on a Budget

Thanks!

• Thank you.

• http://omniti.com/does/postgresql

• We’re hiring, but only if you love:

• lots of data on lots of disks on lots of big boxes

• smart people

• hard problems

• more than one database technology (including PostgreSQL)

• responsibility

Monday, August 2, 2010