development of 8.3 in india

96
December 2007 FOSS.in Josh Berkus, PostgreSQL Core Team 8.3 A Story of Many Patches

Upload: postgresql-experts-inc

Post on 06-May-2015

1.713 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Development of 8.3 In India

December 2007FOSS.in

Josh Berkus, PostgreSQL Core Team

8.3A Story of Many Patches

Page 2: Development of 8.3 In India

PostgreSQL India?

Page 3: Development of 8.3 In India

PostgreSQL 8.3 In Beta

Page 4: Development of 8.3 In India

Many, Many PatchesE.1. Release 8.3Release date: 2007-12-??

Release date: CURRENT AS OF 2007-10-24

E.1.1. OverviewThis release represents a major leap forward for PostgreSQL by adding significant new functionality and performance enhancements. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the follow major capabilities:

Full text search now fully integrated into the core database system

Support the SQL/XML standard, including new operators and an XML data type

Support for enumerated data types (ENUM)

Add Universally Unique Identifier (UUID) data type

Support arrays of composite types

Add control over whether NULLs sort first or last

Support updatable cursors

Server configuration parameters can now be set on a per-function basis

User-defined types can now have type modifiers

Automatically invalidate cached function code when table definitions change or statistics are updated

Page 5: Development of 8.3 In India

PostgreSQL 8.3 Features● Developer

– SQL/XML – Integrated TSearch2– UUID, ENUM– PL/pgSQL debugging

● Admin– CSV Logging– Better Stats– pgStandby

● Consistency– HOT – Load Distributed

Checkpoint● Performance

– Synchronized Scan– Asynch Commit

● Accessories– pgBouncer– pgSNMP

Page 6: Development of 8.3 In India

Many DevelopersTom Lane, USAPeter Eisentraut, GermanyBruce Momjian, USADave Page, EnglandPavan Deolasee, IndiaItagaki Takahiro, JapanGreg Smith, USADavid Fetter, USAPavel Stehule, CzechGreg Stark, EnglandJan Wieck, USAOleg Bartunov, RussiaFlorian PflugJeff Davis, USATrevor HardcastleNikhil S, IndiaHoldger SchurigD'Arcy Cain, CanadaGevik Babakhani, Netherlands

Teodor Sigaev, RussiaAlvaro Herrera, ChileMark Kirkwood, New ZealandJoachim WielandHenry Hotz, USAMagnus Haglander, SwedenTatsuo Ishii, JapanVictor WagnerBill Moran, USAAndrew Dunstan, USAArul Shaji, AustraliaNickolay Samokhvalov, RussiaNeil Conway, CanadaMarc Fournier, CanadaJaime Casanova, EcuadorAlbert CerveraBernd Helmle, GermanyGlen ParkerHeikki Linnakangas, Finland

Steve MarshallPaul BayerDoug KnightGreg Sabino Mullane, USAChad WagnerBrendan JurdEuler Taviera de Oliveira, BrazilJoe Conway, USASimon Riggs, EnglandGuillaume Smet, FranceHiroshi Saito, JapanChris Marcellino, ItalyDave Cramer, CanadaDevrim Gunduz, TurkeyGavin Sherry, AustraliaJeremy DrakeMarko Kreen, EstoniaKris Jurka, USATom Dunstan, USA

Page 7: Development of 8.3 In India

Many DevelopersTom Lane, USAPeter Eisentraut, GermanyBruce Momjian, USADave Page, EnglandPavan Deolasee, IndiaItagaki Takahiro, JapanGreg Smith, USADavid Fetter, USAPavel Stehule, CzechGreg Stark, EnglandJan Wieck, USAOleg Bartunov, RussiaFlorian PflugJeff Davis, USATrevor HardcastleNikhil S, IndiaHoldger SchurigD'Arcy Cain, CanadaGevik Babakhani, Netherlands

Teodor Sigaev, RussiaAlvaro Herrera, ChileMark Kirkwood, New ZealandJoachim WielandHenry Hotz, USAMagnus Haglander, SwedenTatsuo Ishii, JapanVictor WagnerBill Moran, USAAndrew Dunstan, USAArul Shaji, AustraliaNickolay Samokhvalov, RussiaNeil Conway, CanadaMarc Fournier, CanadaJaime Casanova, EcuadorAlbert CerveraBernd Helmle, GermanyGlen ParkerHeikki Linnakangas, Finland

Steve MarshallPaul BayerDoug KnightGreg Sabino Mullane, USAChad WagnerBrendan JurdEuler Taviera de Oliveira, BrazilJoe Conway, USASimon Riggs, EnglandGuillaume Smet, FranceHiroshi Saito, JapanChris Marcellino, ItalyDave Cramer, CanadaDevrim Gunduz, TurkeyGavin Sherry, AustraliaJeremy DrakeMarko Kreen, EstoniaKris Jurka, USATom Dunstan, USA

Page 8: Development of 8.3 In India

PostgreSQL 8.3 Features● Developer

– SQL/XML– Integrated TSearch2– UUID, ENUM– PL/pgSQL debugging

● Admin– CSV Logging– Better Stats– pgStandby

● Consistency– HOT– Load Distributed

Checkpoint● Performance

– Synchronized Scan– Asynch Commit

● Accessories– pgBouncer– pgSNMP

Page 9: Development of 8.3 In India

Why contribute?● PostgreSQL is a community project

– owned by the community, run by the community– if you contribute, you are a full participant

● unlike some other databases● Tinker with cool database stuff

– we are hard-core database geeks– learn a lot from top database hackers

● Improve your employment prospects– database engineers are always in demand

Page 10: Development of 8.3 In India

SQL/XMLXMLROOT (

XMLELEMENT (NAME ’gazonk’,XMLATTRIBUTES (

’val’ AS ’name’,1 + 1 AS ’num’),

XMLELEMENT (NAME ’qux’,’foo’)

),VERSION ’1.0’,STANDALONE YES )

SELECT *FROM table1WHERE (xpath(’//person/name/text()’, xdata))[1]::text = ’John Smith’;

table_to_xml(tbl regclass,nulls boolean,tableforest boolean, targetns text)

<?xml version=’1.0’standalone=’yes’ ?><gazonk name=’val’

num=’2’><qux>foo</qux>

</gazonk>

Page 11: Development of 8.3 In India

SQL/XML Prior Work2002

● TorchBox contributes /contrib/xml2– Some XML functionality:

● Xpath functions● XSLT functions

– BUT● Non-standard, completely PostgreSQL syntax● No real data type● Many features missing

– Charset support– DTD support

Page 12: Development of 8.3 In India

SQL/XML Prior Work2004

● Peter Eisentraut writes XML export– Export table to XML– BUT

● prototype only● not useful without

other XML functionality● syntax requires changing

PostgreSQL parser

Page 13: Development of 8.3 In India

SQL/XML Prior Work2005

● Pavel Stuehle writes SQL/XML syntax demo– First standard syntax example– BUT

● depends on PL/perl● prototype only● does not integrate

with /contrib/xml2

Page 14: Development of 8.3 In India

Nikolay Samokhvalov ● Graduate Student at

University of Moscow● Met major contributor

Oleg Bartunov in 2005– ported MoiKrug.ru to

PostgreSQL● Masters Thesis:

updatable XML views in RDBMS

Page 15: Development of 8.3 In India

● Google funds 700 students to work on Open Source– PostgreSQL gets 7

● Nickolay proposes project for SQL/XML– Proposal accepted– Peter will mentor

Page 16: Development of 8.3 In India

SoC Proposal[SoC Proposal] Initial support of XMLType for PostgreSQLSummary

Primary goal is introduction of special type support for storing XML data in ORDBMS PostgreSQL, querying this data and modifying it. This project is intended to develop manipulation abilities rather than special storage engine (VARCHAR as initial storage implicit type).

At the moment there is no good general vision of most suitable storage for XMLType. Moreover, from my point of view, DBMS should have support of different index types for XMLType - every for its special purpose. And which is more important is an open question. That's why I propose to work on 'external' things rather than 'internals' (data structure for index) and strictly follow standards. But anyway, I've included path index (#7 in the list of Deliverables), because now I suppose that it is most expectative type of index (this item is optional, because here community's feedback is highly needed).Deliverables

Ability to define any column as of XMLType. Initially, this means that only well-formed XML documents could be stored in such a column.

Automatic validation of XML documents being inserted/modified against XML schema, if definition of column contains it (reference to it). DTD and/or XML Schema could be used for this.Subset of SQL/XML standard [1] for mixing relational and XML data in queries. This includes at least following: XMLELEMENT, XMLAGG, XMLFOREST, XMLCONCAT expressions; implementation of mapping rules for basic types. (See Project Details for more details).XML domains support: possibility to define domain based on XMLType, using XML schema (DTD / XML Schema).Basic XPath support (existing experience - contrib/xml2 module - should be taken into account).Basic XSLT support (existing experience - contrib/xml2 module - should be taken into account).Path indexes for fast retrieval of XML documents (queries with XPath expressions in WHERE clause). [OPTIONAL]Documentation (definition rules for XMLType, SQL/XML expressions, etc).

Page 17: Development of 8.3 In India

Mentor: Peter Eisentraut● From Aachen, Germany● Core Team member since 2004● In charge of PostgreSQL Documentation● Prior XML work

Page 18: Development of 8.3 In India

Specification Research● ANSI SQL 2006 -- SQL/XMLXML Publishing Functions

xmlelement() Creates an XML element, allowing the name to be specified.xmlattributes() Creates XML attributes from columns, using the name of each column as the name of the corresponding attribute.xmlroot() Creates the root node of an XML document.xmlcomment() Creates an XML comment.xmlpi() Creates an XML processing instruction.xmlparse() Parses a string as XML and returns the resulting XML structure.xmlforest() Creates XML elements from columns, using the name of each column as the name of the corresponding element.xmlconcat() Combines a list of individual XML values to create a single value containing an XML forest.xmlagg() Combines a collection of rows, each containing a single XML value, to create a single value containing an XML forest.

Page 19: Development of 8.3 In India

Specification Research● ANSI SQL 2006 -- SQL/XML

Page 20: Development of 8.3 In India

Specification ProposalRe: Google SoC--Idea Request

From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>Subject: Re: Google SoC--Idea RequestDate: Tue, 2 May 2006 12:34:43 +0400

Proposal: XMLType for PostgreSQL.

*** Minimum: ***to have special type support for storing XML data and working with it.This means following:- ability to define any column of a table as of XMLType; internally,all data is stored as VARCHAR;- auto validation of documents against XML schema, if it wasspecified in columndefinition or in XML data sheets themselves (DTD, XSD or at least oneof them) /*contrib/xml2 has such feature, but it uses libxml, whatmeans DOM interface. Maybe it's better to use some SAX parser to solvethis task*/;- XPath indexes for queries with path expressions in WHERE clause /*Isuppose this kind of indexes would be most frequently used. I proposeusing good labeling schema and GIST and/or Gin here*/;- some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) hasmore than 400 pages now and contains some established constructions,that are using in other DBMSes. There is the some patch alreadywritten by Pavel Stehule:http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?it was kept for 8.2, so what is the result?) I've tested it severalmonths ago, basic SQL/XML functions worked fine. It changes grammar,but there is no other way... So, using this patch as a part of thisproject means that this project cannot be contrib module,unfortunately. Nevertheless, current paper of SQL/XML standard seemsto be mature - so, compared with existing implementation it would be anice 'landmark';

Page 21: Development of 8.3 In India

Specification RevisionsXML export function signatures

From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>To: pgsql-hackers ( at ) postgresql ( dot ) orgSubject: XML export function signaturesDate: Mon, 12 Feb 2007 20:18:59 +0100

Here are the proposed signatures for the XML export functions.

While I have seen the output formats in use elsewhere, I could not findany useful information on how to invoke these mappings, so thefollowing is purely my own invention.

table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xmlquery_to_xml(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xmltable_to_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xmlquery_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xmltable_to_xml_and_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xmlquery_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xmlcursor_get_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, targetns text) RETURNS xmlcursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, targetns text) RETURNS

xml

Page 22: Development of 8.3 In India

Specification RevisionsRe: XML export function signatures

From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>To: Andrew Dunstan <andrew ( at ) dunslane ( dot ) net>Subject: Re: XML export function signaturesDate: Mon, 12 Feb 2007 23:57:49 +0100

Andrew Dunstan wrote:> . table_to_xml_and_xmlschema seems like a mouthful - can we shorten> it a bit?

Well, it gives you back a mouthful of data, too. :)

> . what are the two ways of representing data that tableforest> distinguishes?

tableforest = false gives you something like

<tablename> <row> <!-- where "row" is constant --> <col1name>data</col1name> <col2name>data</col2name> </row> <row>

Page 23: Development of 8.3 In India

Approved Specification● XML Data Type

– type-safe XML storage– supports XML operators, functions

● XML Functions– Generation– Manipulation (XLST)– Export– XPath Query

● XML Expressions– IS DOCUMENT, etc.

Page 24: Development of 8.3 In India

CodeModifica- tions

Page 25: Development of 8.3 In India

Initial Versions of PatchUpdated XML patch

From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>To: pgsql-patches ( at ) postgresql ( dot ) orgSubject: Updated XML patchDate: Thu, 14 Dec 2006 23:02:05 +0100

Attached is an updated patch for XML functionality, which subsumes all earlier patches on the subject. This includes a data type with format checking, and functions to mangle values. For the moment, I have cut out the inessential stuff such as xpath. The included regression test file xml.sql shows some of the things that work.

This patch already covers most of the parser work. What is left hereafter is adjusting all the corner cases, the escaping rules, and the various XML parser behaviors.

Use configure --with-libxml to build.

-- Peter Eisentrauthttp://developer.postgresql.org/~petere/Attachment: current-xml-patch.bz2Description: BZip2 compressed data

Page 26: Development of 8.3 In India

Initial Patch static void+_outXmlExpr(StringInfo str, XmlExpr *node)+{+ WRITE_NODE_TYPE("XMLEXPR");++ WRITE_ENUM_FIELD(op, XmlExprOp);+ WRITE_STRING_FIELD(name);+ WRITE_NODE_FIELD(named_args);+ WRITE_NODE_FIELD(args);+}++static void _outCoerceToDomain(StringInfo str, CoerceToDomain *node) { WRITE_NODE_TYPE("COERCETODOMAIN");@@ -2019,6 +2030,9 @@ case T_BooleanTest: _outBooleanTest(str, obj); break;+ case T_XmlExpr:+ _outXmlExpr(str, obj);+ break; case T_CoerceToDomain: _outCoerceToDomain(str, obj); break;diff -Nru -x configure ../cvs-pgsql/src/backend/nodes/readfuncs.c ./src/backend/nodes/readfuncs.c--- ../cvs-pgsql/src/backend/nodes/readfuncs.c 2006-12-12 16:31:46.000000000 +0100+++ ./src/backend/nodes/readfuncs.c 2006-12-14 21:20:08.000000000 +0100@@ -765,6 +765,22 @@ }

Page 27: Development of 8.3 In India

Patch RevisionsRe: xml type and encodings

From: "Andrew Dunstan" <andrew ( at ) dunslane ( dot ) net>To: "Peter Eisentraut" <peter_e ( at ) gmx ( dot ) net>Subject: Re: xml type and encodingsDate: Mon, 15 Jan 2007 16:35:13 -0600 (CST)

Peter Eisentraut wrote:> Florian G. Pflug wrote:>> Couldn't the server change the encoding declaration inside the xml to>> the correct>> one (the same as client_encoding) before returning the result?>> The data type output function doesn't know what the client encoding is> or whether the data will be shipped to the client at all. But what I'm> thinking is that we should remove the encoding declaration if possible.> At least that would be less confusing, albeit still potentially> incorrect if the client continues to process the document without care.

The XML SPec says:

"In the absence of information provided by an external transport protocol(e.g. HTTP or MIME), it is a fatal error for an entity including anencoding declaration to be presented to the XML processor in an encodingother than that named in the declaration, or for an entity which beginswith neither a Byte Order Mark nor an encoding declaration to use anencoding other than UTF-8. Note that since ASCII is a subset of UTF-8,ordinary ASCII entities do not strictly need an encoding declaration."

Page 28: Development of 8.3 In India

More Patch Revisionsxpath_array with namespaces support

From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>To: PGSQL-Patches <pgsql-patches ( at ) postgresql ( dot ) org>Subject: xpath_array with namespaces supportDate: Wed, 21 Feb 2007 02:46:33 +0300

As a result of discussion with Peter, I provide modified patch forxpath_array() with namespaces support.

The signature is: _xml xpath_array(text xpathQuery, xml xmlValue[, _text namespacesBindings])

The third argument is 2-dimensional array defining bindings fornamespaces. Simple examples:

xmltest=# SELECT xpath_array('//text()', '<local:dataxmlns:local="http://127.0.0.1";><local:piece id="1">numberone</local:piece><local:piece id="2" /></local:data>'); xpath_array----------------{"number one"}(1 row)

Page 29: Development of 8.3 In India

Yet More Revisionscorrect format for date, time, timestamp for XML functionality

From: "Pavel Stehule" <pavel ( dot ) stehule ( at ) hotmail ( dot ) com>To: pgsql-patches ( at ) postgresql ( dot ) orgSubject: correct format for date, time, timestamp for XML functionalityDate: Tue, 20 Feb 2007 13:27:42 +0100

Hello,

this patch ensures independency datetime fields on current datestyle setting. Add new internal datestyle USE_XSD_DATESTYLE. It's almoust same to USE_ISO_DATESTYLE. Differences are for timestamp:ISO: yyyy-mm-dd hh24:mi:ssXSD: yyyy-mm-ddThh24:mi:ss

I found one link about this topic: http://forums.oracle.com/forums/thread.jspa?threadID=467278&tstart=0RegardsPavel Stehule

Page 30: Development of 8.3 In India

Patch Accepted

From: Bruce Momjian <bruce ( at ) momjian ( dot ) us>To: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>Subject: Re: [HACKERS] xml2 contrib patch supporting default XML namespacesDate: Thu, 22 Mar 2007 16:16:16 -0400 (EDT)

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviewsand approves it.

Page 31: Development of 8.3 In India

Write Documentation9.14. XML FunctionsThe functions and function-like expressions described in this section operate on values of type xml. Check Section 8.13 for information about the xml type. The function-like expressions xmlparse and xmlserialize for converting to and from type xml are not repeated here. Use of many of these functions requires the installation to have been built with configure --with-libxml.

9.14.1. Producing XML ContentA set of functions and function-like expressions are available for producing XML content from SQL data. As such, they are particularly suitable for formatting query results into XML documents for processing in client applications.

9.14.1.1. xmlcomment

xmlcomment(text)

The function xmlcomment creates an XML value containing an XML comment with the specified text as content. The text cannot contain -- or end with a - so that the resulting construct is a valid XML comment. If the argument is null, the result is null.

Example:

SELECT xmlcomment('hello');

xmlcomment-------------- <!--hello-->

Page 32: Development of 8.3 In India

XML in 8.3 BetaE.1. Release 8.3Release date: 2007-12-??

Release date: CURRENT AS OF 2007-10-24

E.1.1. OverviewThis release represents a major leap forward for PostgreSQL by adding significant new functionality and performance enhancements. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the follow major capabilities:

Full text search now fully integrated into the core database system

Support the SQL/XML standard, including new operators and an XML data type

Support for enumerated data types (ENUM)

Add Universally Unique Identifier (UUID) data type

Page 33: Development of 8.3 In India

SQL/XML Feature Set● XML Parsing● XML Functions● XML Export● XPath B-tree Index

Page 34: Development of 8.3 In India

Future XML Projects● Use of HSTORE for advanced XML indexing● Automated XML decomposition

– XML-to-Table– XML-to-Schema

● PL/XSLT– XHTML query

● XQuery support

Page 35: Development of 8.3 In India

HOTHOT

Page 36: Development of 8.3 In India

Fastest OSDB

MySQL PostgreSQL Proprietary0

100

200

300

400

500

600

700

800

900

J2EE Througput

MySQL PostgreSQL Proprietary0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

Acquisition Cost Comparison

Cos

t in

US

Dol

lars

Page 37: Development of 8.3 In India

Most Scalable

Page 38: Development of 8.3 In India

The Consistency Problem

Page 39: Development of 8.3 In India

VACUUM

Page 40: Development of 8.3 In India

What's MVCC?● Multi-Version Concurrency Control

– Each user gets their own “version” of the data– Allows parallelization of updates/reads– Without it, scalability is not possible

● You have to lock everything● Or violate ACID transactions

Page 41: Development of 8.3 In India

MVCCRow Version 1

Page 42: Development of 8.3 In India

MVCCRow Version 1 Row Version 1

Row Version 2

Row Version 3

SELECT ... SELECT ...

BEGIN UPDATE

BEGIN UPDATE

Page 43: Development of 8.3 In India

MVCCRow Version 1 Row Version 1

Row Version 2

Row Version 3

Row Version 2

SELECT ... SELECT ... SELECT ...

BEGIN UPDATE

BEGIN UPDATE

COMMIT

COMMIT

ROLLBACK

Page 44: Development of 8.3 In India

MVCC Them & UsThe Overwriting ModelInnoDB & Oracle

UPDATE

Base Relation Rollback Segment

Copy

Overwrite Row

Old Row Version

Page 45: Development of 8.3 In India

MVCC Them & UsThe Overwriting ModelInnoDB & Oracle

● Advantages– Low table/index maintenance requirements– Latest row version fast access

● Disadvantages– Transaction isolation can break– Long-running transactions expensive– Rollbacks very expensive– Rollback segment bottleneck

Page 46: Development of 8.3 In India

MVCC Them & UsThe Non-overwriting ModelPostgreSQL & Firebird

UPDATE

Base Relation

New Row Version

Old Row Version

Copy

Page 47: Development of 8.3 In India

MVCC Them & UsThe Non-overwriting ModelPostgreSQL & Firebird

● Advantages– Transaction isolation effortless– Rollbacks free– Long-running transactions not a problem

● Disadvantages– High table/index maintenance– “Frequently updated table” problem

Page 48: Development of 8.3 In India

Frequently Updated Tables

Row C: Version 1

Row C: Version 5

Row C: Version 4

Row C: Version 3

Row C: Version 2

Indexes Updated

small update

small update

small update

large update

Tuplestore

Indexes Updated

Indexes Updated

Indexes Updated

Page 49: Development of 8.3 In India

Frequently Updated Tables

Row C: Version 1

Row C: Version 5

Row C: Version 4

Row C: Version 3

Row C: Version 2

Indexes Updated

small update

small update

small update

large update

Tuplestore

Indexes Updated

Indexes Updated

Indexes Updated

Page 50: Development of 8.3 In India

Frequently Updated Tables

Row C: Version 5

Tuplestore

Page 51: Development of 8.3 In India

Frequently Updated Tables

Row C: Version 5

Tuplestore

Page 52: Development of 8.3 In India

Frequently Updated Tables

Row C: Version 5

Tuplestore

Page 53: Development of 8.3 In India

Poor Performance

Page 54: Development of 8.3 In India

Pavan Deolasee● Graduated IIT Bombay

– focus on databases● Worked for VERITAS● Lead Engineer at

EnterpriseDB– PostgreSQL vendor– Contributes performance

patches to community● Lives in Pune

Page 55: Development of 8.3 In India

Team Effort

● Simon Riggs– original

proposal– prototypes– specification

● Heikki Linnakangas, Tom Lane and others– revisions– code review– bug fixes

Page 56: Development of 8.3 In India

Meeting at EnterpriseDB

Page 57: Development of 8.3 In India

Initial proposal● Update-

in-PlacewithHOT file Row C: Version 1

Base Relation HOT File

Page 58: Development of 8.3 In India

Initial proposal● Update-

in-PlacewithHOT file Row C: Version 2

Row C: Version 1

Base Relation HOT File

copy old version

UPDATE

Page 59: Development of 8.3 In India

Initial proposal● Update-

in-PlacewithHOT file Row C: Version 3

Row C: Version 2

Row C: Version 1

tuple chain

Base Relation HOT File

UPDATEcopy oldversion

Page 60: Development of 8.3 In India

Initial proposal● Update-

in-PlacewithHOT file Row C: Version 4

Row C: Version 3

Row C: Version 2

Row C: Version 1

tuple chain

tuple chain

Base Relation HOT File

copy oldversion

UPDATE

Page 61: Development of 8.3 In India

Initial proposal● Update-

in-PlacewithHOT file Row C: Version 4

Base Relation

Page 62: Development of 8.3 In India

First proposal to pgsql-hackers

Frequent Update Project: Design Overview of HOT Updates

From: "Simon Riggs" <simon ( at ) 2ndquadrant ( dot ) com>To: <pgsql-hackers ( at ) postgresql ( dot ) org>Subject: Frequent Update Project: Design Overview of HOT UpdatesDate: Thu, 09 Nov 2006 17:13:16 +0000

Design Overview of HOT Updates------------------------------

The objective is to increase the speed of the UPDATE case, whileminimizing the overall negative effects of the UPDATE. We refer to thegeneral requirement as *Frequent Update Optimization*, though thisdesign proposal is for Heap Overflow Tuple (HOT) Updates. It is similarin some ways to the design for SITC already proposed, though has anumber of additional features drawn from other designs to make it apractical and effective implementation.

EnterpriseDB have a working, performant prototype of this design. Thereare still a number of issues to resolve and the intention is to followan open community process to find the best way forward. All requireddetail will be provided for the work conducted so far.

Current PGSQL behaviour is for UPDATEs to create a new tuple versionwithin the heap, so acts from many perspectives as if it were an INSERT.All of the tuple versions are chained together, so that whichever of thetuples is visible to your Snapshot, you can walk the chain to find themost recent tuple version to update.

Page 63: Development of 8.3 In India

Revisions:Reverse Order

Row C: Version 1

Row C: Version 4

Row C: Version 3

Row C: Version 2

in-page update

in-page update

in-page update

Normal Tuples HOT Relation File

UPDATE

Page 64: Development of 8.3 In India

Revisions:Chains, not files

Row C: Version 1

Row C: Version 4

Row C: Version 3

Row C: Version 2

in-page update

in-page update

in-page update

Normal Tuples HOT Tuple Chain

UPDATE

Page 65: Development of 8.3 In India

Add microvacuum

Row C: Version 1

Row C: Version 5

Row C: Version 4

Row C: Version 3

Row C: Version 2

Indexes Updated

in-page update

in-page update

in-page update

new page / index update

Normal Tuples HOT Tuple Chain

microvacuum

UPDATE

Page 66: Development of 8.3 In India

Submit patch draft v.1HOT WIP Patch - version 1

From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>Subject: HOT WIP Patch - version 1Date: Wed, 14 Feb 2007 15:34:46 +0530

This is a WIP patch based on the recent posting by Simon and discussionsthereafter. We are trying to do one piece at a time and intention is to postthe work ASAP so that we could get early and continuous feedback from the community. We could then incorporate those suggestions in the nextWIP patch.

To start with, this patch implements HOT-update for a simple casewhen there is enough free space in the same block so that it can accommodate the new version of the tuple. A necessary condition fordoing HOT-update is that none of the index columns is changed.The old version is marked as HEAP_UPDATE_ROOT and the newversion is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated, no new index entry is added.

Page 67: Development of 8.3 In India

Feature Freeze

Page 68: Development of 8.3 In India

Submit another versionHOT WIP Patch - version 2

From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>,

pgsql-patches ( at ) postgresql ( dot ) orgSubject: HOT WIP Patch - version 2Date: Tue, 20 Feb 2007 12:08:14 +0530

Reposting - looks like the message did not get through in the firstattempt. My apologies if multiple copies are received.

This is the next version of the HOT WIP patch. Since the last patch that I sent out, I have implemented the HOT-update chain pruning mechanism.

When following a HOT-update chain from the index fetch, if we notice that the root tuple is dead and it is HOT-updated, we try to prune the chain tothe smallest possible length. To do that, the share lock is upgraded to anexclusive lock and the tuple chain is followed till we find a live/recently-dead tuple. At that point, the root t_ctid is made point to that tuple. In order topreserve the xmax/xmin chain, the xmax of the root tuple is also updatedto xmin of the found tuple. Since this xmax is also < RecentGlobalXmin and is a committed transaction, the visibility of the root tuple still remainsthe same.

Page 69: Development of 8.3 In India

Submit another versionHOT WIP Patch - version 3.2

From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>,

pgsql-patches ( at ) postgresql ( dot ) orgSubject: HOT WIP Patch - version 3.2Date: Sun, 25 Feb 2007 00:06:04 +0530

Please see the attached WIP HOT patch - version 3.2. It nowimplements the logic for reusing heap-only dead tuples. When aHOT-update chain is pruned, the heap-only tuples are markedLP_DELETE. The lp_offset and lp_len fields in the line pointer are maintained.

When a backend runs out of free space in a page when doing anUPDATE, it searches the line pointers to find a slot which is marked LP_DELETEd and has enough space to accommodate the new tuple. If such a slot is found, its reused. We might waste some space if the slot is larger than the tuple, but that gets reclaimed at VACUUM time.

Page 70: Development of 8.3 In India

Yet another versionHOT WIP Patch - version 6.3

From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>Subject: HOT WIP Patch - version 6.3Date: Mon, 2 Apr 2007 17:51:13 +0530

Please see the HOT version 6.3 patch posted on pgsql-patches.I've implemented support for CREATE INDEX and CREATE INDEXCONCURRENTLY based on the recent discussions. The implementationis not yet complete and needs some more testing/work/discussion before we can start considering it for review.

One of the regression test case fails because CIC now works inthree phases. In the first phase, we just create the catalog entryfor the index and commit the transaction. If the index_build fails because of any error (say, unique key constraint) the index creationfails, but the catalog entry remains.

Page 71: Development of 8.3 In India

Many issues resolved● CREATE INDEX

– including CONCURRENTLY● Re-using dead tuples● Interaction with Cluster● Plan invalidation● Utilities & tools

Page 72: Development of 8.3 In India

But still not reviewed

Page 73: Development of 8.3 In India

Tom Lane says:“break it up, please!”

● Too big a patch for reviewers– almost 12,000 lines

● Broken up into 5 parts– 1. The basic HOT implementation– 2. Retain vacuum, chain pruning and other tricks– 3. Fix the broken VACUUM and VACUUM FULL

code– 4. Fix the broken CREATE INDEX– pg_stats and other misc. utilities

Page 74: Development of 8.3 In India

Code Reviewed

Page 75: Development of 8.3 In India

PostgreSQL Beta

Page 76: Development of 8.3 In India

HOT Performance

Page 77: Development of 8.3 In India

SKYLINE OF

SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX | DIFF]

SELECT *

FROM books

SKYLINE OF rating MAX, price MIN;

Page 78: Development of 8.3 In India

CDE @ IIIT, Hyderabad

Page 79: Development of 8.3 In India

Feature proposed 3/3

Page 80: Development of 8.3 In India

Extension to SQL syntax

SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF],  .., dm [MIN | MAX | DIFF]

Page 81: Development of 8.3 In India

Approximate Queries

Page 82: Development of 8.3 In India

Approximate QueriesSELECT *

FROM Books

SKYLINE OF rating MAX, price MIN;

Page 83: Development of 8.3 In India

Lots of discussion

Page 84: Development of 8.3 In India

Problems with the PatchRe: PostgreSQL - 'SKYLINE OF' clause added!

From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz>Subject: Re: PostgreSQL - 'SKYLINE OF' clause added!Date: Thu, 08 Mar 2007 01:12:22 -0500

Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes:> Tom Lane wrote:>> Well, whether it's horrible or not is in the eye of the beholder, but>> this is certainly a non-standard syntax extension.

> Being non-standard should not be the only reason to reject a worthwhile > feature.

No, but being non-standard is certainly an indicator that the featuremay not be of widespread interest --- if it were, the SQL committeewould've gotten around to including it; seems they've managed to includeeverything but the kitchen sink already. Add to that the complete lackof any previous demand for the feature, and you have to wonder where themarket is.

> The fact that several > different groups have been mentioned to be working on this feature would > indicate that it is worth considering.

Page 85: Development of 8.3 In India

Problems with the Patch● Not part of the ANSI SQL standard

– possibly low general applicability– might get added to standard with different syntax– might never get standardized at all

● Requires changes to PostgreSQL parser– new keyword break applications– possible side effects

● Not coded to PostgreSQL standards– would need refactoring

Page 86: Development of 8.3 In India

Rejected!Re: PostgreSQL - 'SKYLINE OF' clause rejected

From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz>Subject: Re: PostgreSQL - 'SKYLINE OF' clause added!Date: Sun, 11 Mar 2007 23:44:41 -0400

Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes:> If we consider this thoroughly and compile a suitable syntax that covers > all bases it could be used as the basis of the standard definition or be > close to what ends up in the standard.

I'll bet you a very good dinner that the word SKYLINE will never be seenin the standard.

To me, the proposed feature seems an extremely narrow, special-purposething. The SQL committee have never been into that very much, and seemeven less interested in the last couple of revisions. They likemechanisms that can be used to solve a wide variety of problems, andare not afraid to introduce conceptual complexity to get there.Two examples for you: outer joins and recursive queries. Oracle's(+) syntax is more compact than what got into the spec, but lessprecise and less functional. For recursive queries, CONNECT BY isway simpler than what got into the spec, but again doesn't cover asmuch ground. The SKYLINE clause seems to me to be right about onpar with CONNECT BY ... it does something useful, but only one thing.

Page 87: Development of 8.3 In India

Solution: pgFoundry

Page 88: Development of 8.3 In India

Contributor Resources

Page 89: Development of 8.3 In India

Mailing Lists● Hackers list

– pgsql-hackers– main list for development discussion

● Patch list– pgsql-patches– submit your patch here after discussion on -hackers

● Specific feature lists– pgsql-jdbc, pgsql-performance, pgsql-sql, etc.– subscribe at www.postgresql.org/community/lists

Page 90: Development of 8.3 In India

Web Sites● www.postgresql.org

– main site● www.pgfoundry.org

– add-ins, drivers, tools● developer.postgresql.org

– developer wiki, including TODO lists● archives.postgresql.org

– mailing list archives -- search for your idea here

Page 91: Development of 8.3 In India

Documentation● www.postgresql.org/docs

– main documentation– internals:/docs/current/static/internals.html– code conventions: /docs/current/static/source.html

● doxygen.postgresql.org– annotated source code

● www.postgresql.org/docs/faqs.FAQ_DEV.html– developer FAQ

Page 92: Development of 8.3 In India

The PostgreSQL YearRC and Branch

Feature Freeze

Beta

RC and Branch

Development Period

Integration & Review(1 month)

Beta Testing(1-2 months)

December 2007

February 1, 2008

August 1, 2008

September, 2008

October, 2008

Patch Commit Fest

Development Period

April 1, 2008Patch Commit Fest

Development Period

June 1, 2008Patch Commit Fest

Development Period

Page 93: Development of 8.3 In India

Other tips on submitting● Don't get discouraged.

– Be prepared to argue.– One hacker rejecting your idea doesn't mean

everyone does.– Committers (esp. Tom Lane) are often more

concerned about maintainability than cool stuff.● Be flexible: you will have to make changes.

– Corporate and academic coding standards are generally lower than the project's.

Page 94: Development of 8.3 In India

Other tips on submitting● Don't use the wrong arguments

– “MySQL/Oracle does it this way.”– “Based on this hot academic trend.”

● Some things make a patch harder to accept– New syntax– Backwards compatibility issues– High code counts

● Don't get discouraged.

Page 95: Development of 8.3 In India

Now, go write some code.

or contribute in some easier way

Page 96: Development of 8.3 In India

Contact Information● Josh Berkus

[email protected]– blogs.ittoolbox.com/

database/soup– www.sun.com/postgresql

● PostgreSQL India– [email protected]

● Pavan Deolasee– pavan.deolasee

@enterprisedb.com– www.enterprisedb.com

This talk is copyright 2007 Josh Berkus, and is licensed under the creative commons attribution license