8.4 upcoming features
DESCRIPTION
TRANSCRIPT
February 2009San Franciso
Josh Berkus, PostgreSQL Core Team
Version 8.4Holy Frijole,That's a lot of features!
8.4: A Few Patches
8.4: A Few Patches
5 CommitFests
8.4: A Few Patches
5 CommitFests
9 Months of Development
8.4: A Few Patches
5 CommitFests
9 Months of Development
Over 1600 GIT Updates
8.4: A Few Patches
5 CommitFests
9 Months of Development
Over 1600 GIT Updates
More Than 2 Dozen Major Features
Looks like some database projects still know how to put out new version.
Some-but-not-all 8.4 Features● Windowing Functions● Common Table Expressions● Parallel Restore● CIText● array_agg● auto_explain● SQL/MED connection manager● Per-Database Collations● \d commands improved● Multicolumn GIN indexes● Column-level permissions
● Unsigned Integers● Boyer-Moore String Searching● Improved Hash Indexes● More DTrace probes● Default & Variadic parameters● New PL/pgSQL statements● pg_stat_statements● pg_stat_functions● SSL refactor● pg_hba improvements● Performance improvements
SQL Features● Windowing Functions● Common Table Expressions● array_agg● Per-database Collations● New data types
– Unsigned Integers– CIText
● Improved \d commands● Add columns to existing VIEWs
Windowing Functions● Aggregate over part of the data
– SQL 2008 standard– Great for BI, OLAP
● Functions:– row_number()– rank()– lead()– lag()
● More from David Fetter later!
Windowing Functions
SELECT depname, empno, salary, rank() OVER
(PARTITION BY depname ORDER BY salary)
FROM empsalary;
SELECT y, m, SUM(SUM(people)) OVER (PARTITION BY y ORDER BY m), AVG(people)FROM( SELECT EXTRACT(YEAR FROM accident_date) AS y, EXTRACT(MONTH FROM accident_date) AS m, * FROM accident)sGROUP BY y, m;
Common Table Expressions● Ability to create "named subqueries" for your
query.● Best use: WITH RECURSIVE
– real recursive queries– "walk" trees with one query
● more from David Fetter later
Common Table ExpressionsWITH RECURSIVE subdepartment AS( -- SELECT * FROM department WHERE id = 'A'
UNION ALL
-- recursive term referring to "subdepartment" SELECT d.* FROM department AS d, subdepartment AS sd WHERE d.id = sd.parent_department)SELECT * FROM subdepartment;
array_agg● History:
– added Arrays in 7.4● array_accum() aggregate example code
– intarray contrib module in 8.0● only ints, but very fast
● array_agg() in 8.4: all arrays, fast C Code– from Robert Haas, new contributor!
SELECT status, array_agg(username) FROM logins GROUP BY status;
Per-Database Collations● Collations (ordering character sets) used to be
per installation● Now they are per database● Someday they will be per column● Google Summer of Code Project!CREATE DATABASE mydb
COLLATE 'sv_se.UTF-8'
CTYPE 'sv_se.UTF-8'
TEMPLATE template0
New Data Types● Make migrating from other DBMSes easier● CIText (in /contrib)
– Case Insensitive Text– Full CI indexing, comparisons
● Unsigned Integers (in pgFoundry)– migrate from MySQL, others
Better \d in psql● \d is now multi-version compatible
– \dt etc. won't error if you connect an 8.4 client to an 8.2 database
● \df for user functions only– \dfS for system functions
● \ef to edit a funcion
Add columns to VIEWs● In the bad old days:
– need to add another column to your VIEW?– have to drop it & recreate it– have to drop & recreate all dependancies– enter the World Of Pain
● In 8.4:– ALTER VIEW lets you add columns– Can't rename or modify though
Performance & Monitoring● Parallel Restore● Improved Hash Indexes● pg_stat_user_functions● pg_stat_statements● More Dtrace probes● auto_explain● Other Performance Improvements
Parallel Restore● In 8.3, we were single-threaded
pg_dump Restoredumpfile
8 Hours
Parallel Restore● In 8.4, Multi-core, Restore!
pg_dump
Restore
dumpfile
2 Hours
Restore
RestoreRestore
RestoreRestore
RestoreRestore
Improved Hash Indexes● Our old hash indexes were slow and useless● Improved hash indexes are fast!
– use them for ID columns● or other unique keys
– not completely recovery-safe yet though● don't switch over production DBs until 8.5
● Google Summer of Code project!
pg_stat_user_functions● For each of your functions, see
– # of times called– amount of time spent– amount of time spent excluding other functions
pg_stat_statements
pg_stat_statements
log pgFouinelogfile
pg_stat_statementspostgres=# SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 3;-[ RECORD 1 ]------------------------------------------------------------userid | 10dbid | 63781query | UPDATE branches SET bbalance = bbalance + $1 WHERE bid = $2;calls | 3000total_time | 20.716706rows | 3000-[ RECORD 2 ]------------------------------------------------------------userid | 10dbid | 63781query | UPDATE tellers SET tbalance = tbalance + $1 WHERE tid = $2;calls | 3000total_time | 17.1107649999999rows | 3000-[ RECORD 3 ]------------------------------------------------------------userid | 10dbid | 63781query | UPDATE accounts SET abalance = abalance + $1 WHERE aid = $2;calls | 3000total_time | 0.645601rows | 3000
More DTrace Probes* Probes to measure query timequery-parse-start (int, char *)query-parse-done (int, char *)query-plan-start ()query-plan-done ()query-execute-start ()query-execute-done ()query-statement-start (int, char *)query-statement-done (int, char *)
* Probes to measure dirty buffer writes by the backend because bgwriter is not effective
dirty-buffer-write-start (int, int, int, int)dirty-buffer-write-done (int, int, int, int)
* Probes to measure physical writes from the shared bufferbuffer-write-start (int, int, int, int)buffer-write-done (int, int, int, int, int)
* Probes to measure reads of a relation from a particular buffer blockbuffer-read-start (int, int, int, int, int)buffer-read-done (int, int, int, int, int, int)
* Probes to measure the effectiveness of buffer cachingbuffer-hit ()buffer-miss ()
* Probes to measure I/O time because wal_buffers is too smallwal-buffer-write-start ()wal-buffer-write-done ()
* Probes to measure checkpoint stats such as running time, buffers written, xlog files added, removed, recycled, etc
checkpoint-start (int)checkpoint-done (int, int, int, int, int)
* Probes to measure Idle in Transaction and client/network timeidle-transaction-start (int, int)idle-transaction-done ()
* Probes to measure sort timesort-start (int, int, int, int, int)sort-done (int, long)
* Probes to determine whether or not the deadlock detector has found a deadlock
deadlock-found ()deadlock-notfound (int)
* Probes to measure reads/writes by block numbers and relationssmgr-read-start (int, int, int, int)smgr-read-end (int, int, int, int, int, int)smgr-write-start (int, int, int, int)smgr-write-end (int, int, int, int, int, int)
auto_explain● misnamed; actually allows you to manually set specific
queries/sessions/functions to output explain plans to the log postgres=# LOAD 'auto_explain'; postgres=# SET auto_explain.log_min_duration = 0; postgres=# SELECT count(*) FROM pg_class, pg_index WHERE oid = indrelid AND indisunique;
This might produce log output such as:
LOG: duration: 0.986 ms plan: Aggregate (cost=14.90..14.91 rows=1 width=0) -> Hash Join (cost=3.91..14.70 rows=81 width=0) Hash Cond: (pg_class.oid = pg_index.indrelid) -> Seq Scan on pg_class (cost=0.00..8.27 rows=227 width=4) -> Hash (cost=2.90..2.90 rows=81 width=4) -> Seq Scan on pg_index (cost=0.00..2.90 rows=81 width=4)
●
More Performance Improvements● Free Space Map is dynamically sized (no more
max_fsm_pages!)● Visibility Map
– VACUUM only changed pages– Index-only Scans in 8.5
● Less writing to pgstat file– plus you can move it
Stored Procedures● Default Parameters● Variadic Parameters● New PL/pgSQL Statements● PL/pythonU OUT Parameters
DEFAULT parameters
CREATE OR REPLACE FUNCTI ON) adder a i nt 4 0de f aul t ,
b i nt 2de f aul t ( ' 'RETURNS i nt LANGUAGE sql
' $ + $ 'AS sel ect 1 2 ;
) (SELECT adder ; ) (SELECT adder 1 ; ) (SELECT adder 1, 2 ;
VARIADIC parametersCREATE OR REPLACE FUNCTION
adder(VARIADIC v int[]) RETURNS int AS $$
DECLARE s int; i int;BEGIN
s:=0;FOR i IN SELECT generate_subscripts(v,1) LOOP
s := s + i;END LOOP;
RETURN s;END;$$ LANGUAGE 'plpgsql';
SELECT adder(1);SELECT adder(1,2,3);SELECT adder(40,2);
New PL/PgSQL Statements● RETURNS TABLE
– SQL-compliant alias for "SETOF"● CASE statement
– real switching logicCASE
WHEN x BETWEEN 0 AND 10 THEN
msg := 'value is between zero and ten';
WHEN x BETWEEN 11 AND 20 THEN
msg := 'value is between eleven and twenty';
END CASE;
PL/pythonU OUT Parameters● You now can use IN, OUT and INOUT
parameters with PL/pythonU functions.● That's it!
Exotic Features● SQL/MED Connection Manager● Multi-column GIN Indexes● Boyer-Moore String Searching
SQL/MED● Foundation for connecting to external servers
– Future of PL/proxy and DBconnect– Future of DBI-Connect
CREATE FOREIGN DATA WRAPPER pgsql LIBRARY 'pgsql_fdw';
CREATE SERVER foo FOREIGN DATA WRAPPER pgsql OPTIONS (host 'remotehost', dbname 'remotedb');
CREATE USER MAPPING FOR PUBLIC SERVER foo OPTIONS (username 'bob', password 'secret');
Multi-Column GIN Indexes● Bad Old Days: to do a single Full Text Search
index over several columns, you had to concatenate them.
● New Goodness: you can now do a proper multicolumn index– and it's faster!
Boyer-Moore String Searching
Boyer-Moore String Searching
No, I don't know what it is either.
But we have it now.
Security● Refactored SSL● Improved pg_hba.conf● Column-level Permissions● SE-Postgres
Refactored SSL by Magnus
● Proper certificate verification– Choose level, full verification is default
● Control over all key and certificate files● SSL certificate authentication
– Trusted root certificate– Map «cn» value of certificate
pg_hba Improvements● "crypt" is gone (insecure)● «ident sameuser» => «ident»● New format for options
– name=value for all options● usermaps for all external methods
– with regexp support● Parsed on reload
Column PermissionsREVOKE SELECT (col1, col2), INSERT (col1, col2) ON tab1 FROM role2;
● Restrict access to sensitive columns from unprivileged ROLEs– more fine-grained security– no longer need to use VIEWs to do this
PostgreSQL Needs YOU
Many Patches == Lots of Testing● Bug Testing
– can you make 8.4 crash?● Specification Testing
– do the features do what the docs say they do?● Performance Testing
– is 8.4 really faster? How much?● Combinational Testing
– what happens when you put several new features together?
Many Patches == Lots of Testing
1. Take a copy of your production applications2. Port them to 8.43. Report breakage and issues4. Play with implementing new features
Do It Now!We're counting on you!
Contact Information● Josh Berkus
– [email protected]– http://it.toolbox.com/
blogs/database-soup
● Upcoming events– SCALE 7, Los
Angeles, Feb. 20– pgCon 2009, Ottawa,
May 20
This talk is copyright 2009 Josh Berkus, and is licensed under the Creative Commons Attribution License