development of 8.3 in india
Post on 06-May-2015
1.713 Views
Preview:
TRANSCRIPT
December 2007FOSS.in
Josh Berkus, PostgreSQL Core Team
8.3A Story of Many Patches
PostgreSQL India?
PostgreSQL 8.3 In Beta
Many, Many PatchesE.1. Release 8.3Release date: 2007-12-??
Release date: CURRENT AS OF 2007-10-24
E.1.1. OverviewThis release represents a major leap forward for PostgreSQL by adding significant new functionality and performance enhancements. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the follow major capabilities:
Full text search now fully integrated into the core database system
Support the SQL/XML standard, including new operators and an XML data type
Support for enumerated data types (ENUM)
Add Universally Unique Identifier (UUID) data type
Support arrays of composite types
Add control over whether NULLs sort first or last
Support updatable cursors
Server configuration parameters can now be set on a per-function basis
User-defined types can now have type modifiers
Automatically invalidate cached function code when table definitions change or statistics are updated
PostgreSQL 8.3 Features● Developer
– SQL/XML – Integrated TSearch2– UUID, ENUM– PL/pgSQL debugging
● Admin– CSV Logging– Better Stats– pgStandby
● Consistency– HOT – Load Distributed
Checkpoint● Performance
– Synchronized Scan– Asynch Commit
● Accessories– pgBouncer– pgSNMP
Many DevelopersTom Lane, USAPeter Eisentraut, GermanyBruce Momjian, USADave Page, EnglandPavan Deolasee, IndiaItagaki Takahiro, JapanGreg Smith, USADavid Fetter, USAPavel Stehule, CzechGreg Stark, EnglandJan Wieck, USAOleg Bartunov, RussiaFlorian PflugJeff Davis, USATrevor HardcastleNikhil S, IndiaHoldger SchurigD'Arcy Cain, CanadaGevik Babakhani, Netherlands
Teodor Sigaev, RussiaAlvaro Herrera, ChileMark Kirkwood, New ZealandJoachim WielandHenry Hotz, USAMagnus Haglander, SwedenTatsuo Ishii, JapanVictor WagnerBill Moran, USAAndrew Dunstan, USAArul Shaji, AustraliaNickolay Samokhvalov, RussiaNeil Conway, CanadaMarc Fournier, CanadaJaime Casanova, EcuadorAlbert CerveraBernd Helmle, GermanyGlen ParkerHeikki Linnakangas, Finland
Steve MarshallPaul BayerDoug KnightGreg Sabino Mullane, USAChad WagnerBrendan JurdEuler Taviera de Oliveira, BrazilJoe Conway, USASimon Riggs, EnglandGuillaume Smet, FranceHiroshi Saito, JapanChris Marcellino, ItalyDave Cramer, CanadaDevrim Gunduz, TurkeyGavin Sherry, AustraliaJeremy DrakeMarko Kreen, EstoniaKris Jurka, USATom Dunstan, USA
Many DevelopersTom Lane, USAPeter Eisentraut, GermanyBruce Momjian, USADave Page, EnglandPavan Deolasee, IndiaItagaki Takahiro, JapanGreg Smith, USADavid Fetter, USAPavel Stehule, CzechGreg Stark, EnglandJan Wieck, USAOleg Bartunov, RussiaFlorian PflugJeff Davis, USATrevor HardcastleNikhil S, IndiaHoldger SchurigD'Arcy Cain, CanadaGevik Babakhani, Netherlands
Teodor Sigaev, RussiaAlvaro Herrera, ChileMark Kirkwood, New ZealandJoachim WielandHenry Hotz, USAMagnus Haglander, SwedenTatsuo Ishii, JapanVictor WagnerBill Moran, USAAndrew Dunstan, USAArul Shaji, AustraliaNickolay Samokhvalov, RussiaNeil Conway, CanadaMarc Fournier, CanadaJaime Casanova, EcuadorAlbert CerveraBernd Helmle, GermanyGlen ParkerHeikki Linnakangas, Finland
Steve MarshallPaul BayerDoug KnightGreg Sabino Mullane, USAChad WagnerBrendan JurdEuler Taviera de Oliveira, BrazilJoe Conway, USASimon Riggs, EnglandGuillaume Smet, FranceHiroshi Saito, JapanChris Marcellino, ItalyDave Cramer, CanadaDevrim Gunduz, TurkeyGavin Sherry, AustraliaJeremy DrakeMarko Kreen, EstoniaKris Jurka, USATom Dunstan, USA
PostgreSQL 8.3 Features● Developer
– SQL/XML– Integrated TSearch2– UUID, ENUM– PL/pgSQL debugging
● Admin– CSV Logging– Better Stats– pgStandby
● Consistency– HOT– Load Distributed
Checkpoint● Performance
– Synchronized Scan– Asynch Commit
● Accessories– pgBouncer– pgSNMP
Why contribute?● PostgreSQL is a community project
– owned by the community, run by the community– if you contribute, you are a full participant
● unlike some other databases● Tinker with cool database stuff
– we are hard-core database geeks– learn a lot from top database hackers
● Improve your employment prospects– database engineers are always in demand
SQL/XMLXMLROOT (
XMLELEMENT (NAME ’gazonk’,XMLATTRIBUTES (
’val’ AS ’name’,1 + 1 AS ’num’),
XMLELEMENT (NAME ’qux’,’foo’)
),VERSION ’1.0’,STANDALONE YES )
SELECT *FROM table1WHERE (xpath(’//person/name/text()’, xdata))[1]::text = ’John Smith’;
table_to_xml(tbl regclass,nulls boolean,tableforest boolean, targetns text)
<?xml version=’1.0’standalone=’yes’ ?><gazonk name=’val’
num=’2’><qux>foo</qux>
</gazonk>
SQL/XML Prior Work2002
● TorchBox contributes /contrib/xml2– Some XML functionality:
● Xpath functions● XSLT functions
– BUT● Non-standard, completely PostgreSQL syntax● No real data type● Many features missing
– Charset support– DTD support
SQL/XML Prior Work2004
● Peter Eisentraut writes XML export– Export table to XML– BUT
● prototype only● not useful without
other XML functionality● syntax requires changing
PostgreSQL parser
SQL/XML Prior Work2005
● Pavel Stuehle writes SQL/XML syntax demo– First standard syntax example– BUT
● depends on PL/perl● prototype only● does not integrate
with /contrib/xml2
Nikolay Samokhvalov ● Graduate Student at
University of Moscow● Met major contributor
Oleg Bartunov in 2005– ported MoiKrug.ru to
PostgreSQL● Masters Thesis:
updatable XML views in RDBMS
● Google funds 700 students to work on Open Source– PostgreSQL gets 7
● Nickolay proposes project for SQL/XML– Proposal accepted– Peter will mentor
SoC Proposal[SoC Proposal] Initial support of XMLType for PostgreSQLSummary
Primary goal is introduction of special type support for storing XML data in ORDBMS PostgreSQL, querying this data and modifying it. This project is intended to develop manipulation abilities rather than special storage engine (VARCHAR as initial storage implicit type).
At the moment there is no good general vision of most suitable storage for XMLType. Moreover, from my point of view, DBMS should have support of different index types for XMLType - every for its special purpose. And which is more important is an open question. That's why I propose to work on 'external' things rather than 'internals' (data structure for index) and strictly follow standards. But anyway, I've included path index (#7 in the list of Deliverables), because now I suppose that it is most expectative type of index (this item is optional, because here community's feedback is highly needed).Deliverables
Ability to define any column as of XMLType. Initially, this means that only well-formed XML documents could be stored in such a column.
Automatic validation of XML documents being inserted/modified against XML schema, if definition of column contains it (reference to it). DTD and/or XML Schema could be used for this.Subset of SQL/XML standard [1] for mixing relational and XML data in queries. This includes at least following: XMLELEMENT, XMLAGG, XMLFOREST, XMLCONCAT expressions; implementation of mapping rules for basic types. (See Project Details for more details).XML domains support: possibility to define domain based on XMLType, using XML schema (DTD / XML Schema).Basic XPath support (existing experience - contrib/xml2 module - should be taken into account).Basic XSLT support (existing experience - contrib/xml2 module - should be taken into account).Path indexes for fast retrieval of XML documents (queries with XPath expressions in WHERE clause). [OPTIONAL]Documentation (definition rules for XMLType, SQL/XML expressions, etc).
Mentor: Peter Eisentraut● From Aachen, Germany● Core Team member since 2004● In charge of PostgreSQL Documentation● Prior XML work
Specification Research● ANSI SQL 2006 -- SQL/XMLXML Publishing Functions
xmlelement() Creates an XML element, allowing the name to be specified.xmlattributes() Creates XML attributes from columns, using the name of each column as the name of the corresponding attribute.xmlroot() Creates the root node of an XML document.xmlcomment() Creates an XML comment.xmlpi() Creates an XML processing instruction.xmlparse() Parses a string as XML and returns the resulting XML structure.xmlforest() Creates XML elements from columns, using the name of each column as the name of the corresponding element.xmlconcat() Combines a list of individual XML values to create a single value containing an XML forest.xmlagg() Combines a collection of rows, each containing a single XML value, to create a single value containing an XML forest.
Specification Research● ANSI SQL 2006 -- SQL/XML
Specification ProposalRe: Google SoC--Idea Request
From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>Subject: Re: Google SoC--Idea RequestDate: Tue, 2 May 2006 12:34:43 +0400
Proposal: XMLType for PostgreSQL.
*** Minimum: ***to have special type support for storing XML data and working with it.This means following:- ability to define any column of a table as of XMLType; internally,all data is stored as VARCHAR;- auto validation of documents against XML schema, if it wasspecified in columndefinition or in XML data sheets themselves (DTD, XSD or at least oneof them) /*contrib/xml2 has such feature, but it uses libxml, whatmeans DOM interface. Maybe it's better to use some SAX parser to solvethis task*/;- XPath indexes for queries with path expressions in WHERE clause /*Isuppose this kind of indexes would be most frequently used. I proposeusing good labeling schema and GIST and/or Gin here*/;- some subset of SQL/XML. Actually, part 14 of SQL:200n (SQL/XML) hasmore than 400 pages now and contains some established constructions,that are using in other DBMSes. There is the some patch alreadywritten by Pavel Stehule:http://www.pgsql.ru/db/mw/msg.html?mid=2096818. (BTW, what is with it?it was kept for 8.2, so what is the result?) I've tested it severalmonths ago, basic SQL/XML functions worked fine. It changes grammar,but there is no other way... So, using this patch as a part of thisproject means that this project cannot be contrib module,unfortunately. Nevertheless, current paper of SQL/XML standard seemsto be mature - so, compared with existing implementation it would be anice 'landmark';
Specification RevisionsXML export function signatures
From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>To: pgsql-hackers ( at ) postgresql ( dot ) orgSubject: XML export function signaturesDate: Mon, 12 Feb 2007 20:18:59 +0100
Here are the proposed signatures for the XML export functions.
While I have seen the output formats in use elsewhere, I could not findany useful information on how to invoke these mappings, so thefollowing is purely my own invention.
table_to_xml(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xmlquery_to_xml(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xmltable_to_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xmlquery_to_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xmltable_to_xml_and_xmlschema(tbl regclass, nulls boolean, tableforest boolean, targetns text) RETURNS xmlquery_to_xml_and_xmlschema(query text, nulls boolean, tableforest boolean, targetns text) RETURNS xmlcursor_get_xml(cursor refcursor, count int, nulls boolean, tableforest boolean, targetns text) RETURNS xmlcursor_to_xmlschema(cursor refcursor, nulls boolean, tableforest boolean, targetns text) RETURNS
xml
Specification RevisionsRe: XML export function signatures
From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>To: Andrew Dunstan <andrew ( at ) dunslane ( dot ) net>Subject: Re: XML export function signaturesDate: Mon, 12 Feb 2007 23:57:49 +0100
Andrew Dunstan wrote:> . table_to_xml_and_xmlschema seems like a mouthful - can we shorten> it a bit?
Well, it gives you back a mouthful of data, too. :)
> . what are the two ways of representing data that tableforest> distinguishes?
tableforest = false gives you something like
<tablename> <row> <!-- where "row" is constant --> <col1name>data</col1name> <col2name>data</col2name> </row> <row>
Approved Specification● XML Data Type
– type-safe XML storage– supports XML operators, functions
● XML Functions– Generation– Manipulation (XLST)– Export– XPath Query
● XML Expressions– IS DOCUMENT, etc.
CodeModifica- tions
Initial Versions of PatchUpdated XML patch
From: Peter Eisentraut <peter_e ( at ) gmx ( dot ) net>To: pgsql-patches ( at ) postgresql ( dot ) orgSubject: Updated XML patchDate: Thu, 14 Dec 2006 23:02:05 +0100
Attached is an updated patch for XML functionality, which subsumes all earlier patches on the subject. This includes a data type with format checking, and functions to mangle values. For the moment, I have cut out the inessential stuff such as xpath. The included regression test file xml.sql shows some of the things that work.
This patch already covers most of the parser work. What is left hereafter is adjusting all the corner cases, the escaping rules, and the various XML parser behaviors.
Use configure --with-libxml to build.
-- Peter Eisentrauthttp://developer.postgresql.org/~petere/Attachment: current-xml-patch.bz2Description: BZip2 compressed data
Initial Patch static void+_outXmlExpr(StringInfo str, XmlExpr *node)+{+ WRITE_NODE_TYPE("XMLEXPR");++ WRITE_ENUM_FIELD(op, XmlExprOp);+ WRITE_STRING_FIELD(name);+ WRITE_NODE_FIELD(named_args);+ WRITE_NODE_FIELD(args);+}++static void _outCoerceToDomain(StringInfo str, CoerceToDomain *node) { WRITE_NODE_TYPE("COERCETODOMAIN");@@ -2019,6 +2030,9 @@ case T_BooleanTest: _outBooleanTest(str, obj); break;+ case T_XmlExpr:+ _outXmlExpr(str, obj);+ break; case T_CoerceToDomain: _outCoerceToDomain(str, obj); break;diff -Nru -x configure ../cvs-pgsql/src/backend/nodes/readfuncs.c ./src/backend/nodes/readfuncs.c--- ../cvs-pgsql/src/backend/nodes/readfuncs.c 2006-12-12 16:31:46.000000000 +0100+++ ./src/backend/nodes/readfuncs.c 2006-12-14 21:20:08.000000000 +0100@@ -765,6 +765,22 @@ }
Patch RevisionsRe: xml type and encodings
From: "Andrew Dunstan" <andrew ( at ) dunslane ( dot ) net>To: "Peter Eisentraut" <peter_e ( at ) gmx ( dot ) net>Subject: Re: xml type and encodingsDate: Mon, 15 Jan 2007 16:35:13 -0600 (CST)
Peter Eisentraut wrote:> Florian G. Pflug wrote:>> Couldn't the server change the encoding declaration inside the xml to>> the correct>> one (the same as client_encoding) before returning the result?>> The data type output function doesn't know what the client encoding is> or whether the data will be shipped to the client at all. But what I'm> thinking is that we should remove the encoding declaration if possible.> At least that would be less confusing, albeit still potentially> incorrect if the client continues to process the document without care.
The XML SPec says:
"In the absence of information provided by an external transport protocol(e.g. HTTP or MIME), it is a fatal error for an entity including anencoding declaration to be presented to the XML processor in an encodingother than that named in the declaration, or for an entity which beginswith neither a Byte Order Mark nor an encoding declaration to use anencoding other than UTF-8. Note that since ASCII is a subset of UTF-8,ordinary ASCII entities do not strictly need an encoding declaration."
More Patch Revisionsxpath_array with namespaces support
From: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>To: PGSQL-Patches <pgsql-patches ( at ) postgresql ( dot ) org>Subject: xpath_array with namespaces supportDate: Wed, 21 Feb 2007 02:46:33 +0300
As a result of discussion with Peter, I provide modified patch forxpath_array() with namespaces support.
The signature is: _xml xpath_array(text xpathQuery, xml xmlValue[, _text namespacesBindings])
The third argument is 2-dimensional array defining bindings fornamespaces. Simple examples:
xmltest=# SELECT xpath_array('//text()', '<local:dataxmlns:local="http://127.0.0.1";><local:piece id="1">numberone</local:piece><local:piece id="2" /></local:data>'); xpath_array----------------{"number one"}(1 row)
Yet More Revisionscorrect format for date, time, timestamp for XML functionality
From: "Pavel Stehule" <pavel ( dot ) stehule ( at ) hotmail ( dot ) com>To: pgsql-patches ( at ) postgresql ( dot ) orgSubject: correct format for date, time, timestamp for XML functionalityDate: Tue, 20 Feb 2007 13:27:42 +0100
Hello,
this patch ensures independency datetime fields on current datestyle setting. Add new internal datestyle USE_XSD_DATESTYLE. It's almoust same to USE_ISO_DATESTYLE. Differences are for timestamp:ISO: yyyy-mm-dd hh24:mi:ssXSD: yyyy-mm-ddThh24:mi:ss
I found one link about this topic: http://forums.oracle.com/forums/thread.jspa?threadID=467278&tstart=0RegardsPavel Stehule
Patch Accepted
From: Bruce Momjian <bruce ( at ) momjian ( dot ) us>To: "Nikolay Samokhvalov" <samokhvalov ( at ) gmail ( dot ) com>Subject: Re: [HACKERS] xml2 contrib patch supporting default XML namespacesDate: Thu, 22 Mar 2007 16:16:16 -0400 (EDT)
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviewsand approves it.
Write Documentation9.14. XML FunctionsThe functions and function-like expressions described in this section operate on values of type xml. Check Section 8.13 for information about the xml type. The function-like expressions xmlparse and xmlserialize for converting to and from type xml are not repeated here. Use of many of these functions requires the installation to have been built with configure --with-libxml.
9.14.1. Producing XML ContentA set of functions and function-like expressions are available for producing XML content from SQL data. As such, they are particularly suitable for formatting query results into XML documents for processing in client applications.
9.14.1.1. xmlcomment
xmlcomment(text)
The function xmlcomment creates an XML value containing an XML comment with the specified text as content. The text cannot contain -- or end with a - so that the resulting construct is a valid XML comment. If the argument is null, the result is null.
Example:
SELECT xmlcomment('hello');
xmlcomment-------------- <!--hello-->
XML in 8.3 BetaE.1. Release 8.3Release date: 2007-12-??
Release date: CURRENT AS OF 2007-10-24
E.1.1. OverviewThis release represents a major leap forward for PostgreSQL by adding significant new functionality and performance enhancements. This was made possible by a growing community that has dramatically accelerated the pace of development. This release adds the follow major capabilities:
Full text search now fully integrated into the core database system
Support the SQL/XML standard, including new operators and an XML data type
Support for enumerated data types (ENUM)
Add Universally Unique Identifier (UUID) data type
SQL/XML Feature Set● XML Parsing● XML Functions● XML Export● XPath B-tree Index
Future XML Projects● Use of HSTORE for advanced XML indexing● Automated XML decomposition
– XML-to-Table– XML-to-Schema
● PL/XSLT– XHTML query
● XQuery support
HOTHOT
Fastest OSDB
MySQL PostgreSQL Proprietary0
100
200
300
400
500
600
700
800
900
J2EE Througput
MySQL PostgreSQL Proprietary0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
Acquisition Cost Comparison
Cos
t in
US
Dol
lars
Most Scalable
The Consistency Problem
VACUUM
What's MVCC?● Multi-Version Concurrency Control
– Each user gets their own “version” of the data– Allows parallelization of updates/reads– Without it, scalability is not possible
● You have to lock everything● Or violate ACID transactions
MVCCRow Version 1
MVCCRow Version 1 Row Version 1
Row Version 2
Row Version 3
SELECT ... SELECT ...
BEGIN UPDATE
BEGIN UPDATE
MVCCRow Version 1 Row Version 1
Row Version 2
Row Version 3
Row Version 2
SELECT ... SELECT ... SELECT ...
BEGIN UPDATE
BEGIN UPDATE
COMMIT
COMMIT
ROLLBACK
MVCC Them & UsThe Overwriting ModelInnoDB & Oracle
UPDATE
Base Relation Rollback Segment
Copy
Overwrite Row
Old Row Version
MVCC Them & UsThe Overwriting ModelInnoDB & Oracle
● Advantages– Low table/index maintenance requirements– Latest row version fast access
● Disadvantages– Transaction isolation can break– Long-running transactions expensive– Rollbacks very expensive– Rollback segment bottleneck
MVCC Them & UsThe Non-overwriting ModelPostgreSQL & Firebird
UPDATE
Base Relation
New Row Version
Old Row Version
Copy
MVCC Them & UsThe Non-overwriting ModelPostgreSQL & Firebird
● Advantages– Transaction isolation effortless– Rollbacks free– Long-running transactions not a problem
● Disadvantages– High table/index maintenance– “Frequently updated table” problem
Frequently Updated Tables
Row C: Version 1
Row C: Version 5
Row C: Version 4
Row C: Version 3
Row C: Version 2
Indexes Updated
small update
small update
small update
large update
Tuplestore
Indexes Updated
Indexes Updated
Indexes Updated
Frequently Updated Tables
Row C: Version 1
Row C: Version 5
Row C: Version 4
Row C: Version 3
Row C: Version 2
Indexes Updated
small update
small update
small update
large update
Tuplestore
Indexes Updated
Indexes Updated
Indexes Updated
Frequently Updated Tables
Row C: Version 5
Tuplestore
Frequently Updated Tables
Row C: Version 5
Tuplestore
Frequently Updated Tables
Row C: Version 5
Tuplestore
Poor Performance
Pavan Deolasee● Graduated IIT Bombay
– focus on databases● Worked for VERITAS● Lead Engineer at
EnterpriseDB– PostgreSQL vendor– Contributes performance
patches to community● Lives in Pune
Team Effort
● Simon Riggs– original
proposal– prototypes– specification
● Heikki Linnakangas, Tom Lane and others– revisions– code review– bug fixes
Meeting at EnterpriseDB
Initial proposal● Update-
in-PlacewithHOT file Row C: Version 1
Base Relation HOT File
Initial proposal● Update-
in-PlacewithHOT file Row C: Version 2
Row C: Version 1
Base Relation HOT File
copy old version
UPDATE
Initial proposal● Update-
in-PlacewithHOT file Row C: Version 3
Row C: Version 2
Row C: Version 1
tuple chain
Base Relation HOT File
UPDATEcopy oldversion
Initial proposal● Update-
in-PlacewithHOT file Row C: Version 4
Row C: Version 3
Row C: Version 2
Row C: Version 1
tuple chain
tuple chain
Base Relation HOT File
copy oldversion
UPDATE
Initial proposal● Update-
in-PlacewithHOT file Row C: Version 4
Base Relation
First proposal to pgsql-hackers
Frequent Update Project: Design Overview of HOT Updates
From: "Simon Riggs" <simon ( at ) 2ndquadrant ( dot ) com>To: <pgsql-hackers ( at ) postgresql ( dot ) org>Subject: Frequent Update Project: Design Overview of HOT UpdatesDate: Thu, 09 Nov 2006 17:13:16 +0000
Design Overview of HOT Updates------------------------------
The objective is to increase the speed of the UPDATE case, whileminimizing the overall negative effects of the UPDATE. We refer to thegeneral requirement as *Frequent Update Optimization*, though thisdesign proposal is for Heap Overflow Tuple (HOT) Updates. It is similarin some ways to the design for SITC already proposed, though has anumber of additional features drawn from other designs to make it apractical and effective implementation.
EnterpriseDB have a working, performant prototype of this design. Thereare still a number of issues to resolve and the intention is to followan open community process to find the best way forward. All requireddetail will be provided for the work conducted so far.
Current PGSQL behaviour is for UPDATEs to create a new tuple versionwithin the heap, so acts from many perspectives as if it were an INSERT.All of the tuple versions are chained together, so that whichever of thetuples is visible to your Snapshot, you can walk the chain to find themost recent tuple version to update.
Revisions:Reverse Order
Row C: Version 1
Row C: Version 4
Row C: Version 3
Row C: Version 2
in-page update
in-page update
in-page update
Normal Tuples HOT Relation File
UPDATE
Revisions:Chains, not files
Row C: Version 1
Row C: Version 4
Row C: Version 3
Row C: Version 2
in-page update
in-page update
in-page update
Normal Tuples HOT Tuple Chain
UPDATE
Add microvacuum
Row C: Version 1
Row C: Version 5
Row C: Version 4
Row C: Version 3
Row C: Version 2
Indexes Updated
in-page update
in-page update
in-page update
new page / index update
Normal Tuples HOT Tuple Chain
microvacuum
UPDATE
Submit patch draft v.1HOT WIP Patch - version 1
From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>Subject: HOT WIP Patch - version 1Date: Wed, 14 Feb 2007 15:34:46 +0530
This is a WIP patch based on the recent posting by Simon and discussionsthereafter. We are trying to do one piece at a time and intention is to postthe work ASAP so that we could get early and continuous feedback from the community. We could then incorporate those suggestions in the nextWIP patch.
To start with, this patch implements HOT-update for a simple casewhen there is enough free space in the same block so that it can accommodate the new version of the tuple. A necessary condition fordoing HOT-update is that none of the index columns is changed.The old version is marked as HEAP_UPDATE_ROOT and the newversion is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated, no new index entry is added.
Feature Freeze
Submit another versionHOT WIP Patch - version 2
From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>,
pgsql-patches ( at ) postgresql ( dot ) orgSubject: HOT WIP Patch - version 2Date: Tue, 20 Feb 2007 12:08:14 +0530
Reposting - looks like the message did not get through in the firstattempt. My apologies if multiple copies are received.
This is the next version of the HOT WIP patch. Since the last patch that I sent out, I have implemented the HOT-update chain pruning mechanism.
When following a HOT-update chain from the index fetch, if we notice that the root tuple is dead and it is HOT-updated, we try to prune the chain tothe smallest possible length. To do that, the share lock is upgraded to anexclusive lock and the tuple chain is followed till we find a live/recently-dead tuple. At that point, the root t_ctid is made point to that tuple. In order topreserve the xmax/xmin chain, the xmax of the root tuple is also updatedto xmin of the found tuple. Since this xmax is also < RecentGlobalXmin and is a committed transaction, the visibility of the root tuple still remainsthe same.
Submit another versionHOT WIP Patch - version 3.2
From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>,
pgsql-patches ( at ) postgresql ( dot ) orgSubject: HOT WIP Patch - version 3.2Date: Sun, 25 Feb 2007 00:06:04 +0530
Please see the attached WIP HOT patch - version 3.2. It nowimplements the logic for reusing heap-only dead tuples. When aHOT-update chain is pruned, the heap-only tuples are markedLP_DELETE. The lp_offset and lp_len fields in the line pointer are maintained.
When a backend runs out of free space in a page when doing anUPDATE, it searches the line pointers to find a slot which is marked LP_DELETEd and has enough space to accommodate the new tuple. If such a slot is found, its reused. We might waste some space if the slot is larger than the tuple, but that gets reclaimed at VACUUM time.
Yet another versionHOT WIP Patch - version 6.3
From: "Pavan Deolasee" <pavan ( dot ) deolasee ( at ) gmail ( dot ) com>To: PostgreSQL-development <pgsql-hackers ( at ) postgresql ( dot ) org>Subject: HOT WIP Patch - version 6.3Date: Mon, 2 Apr 2007 17:51:13 +0530
Please see the HOT version 6.3 patch posted on pgsql-patches.I've implemented support for CREATE INDEX and CREATE INDEXCONCURRENTLY based on the recent discussions. The implementationis not yet complete and needs some more testing/work/discussion before we can start considering it for review.
One of the regression test case fails because CIC now works inthree phases. In the first phase, we just create the catalog entryfor the index and commit the transaction. If the index_build fails because of any error (say, unique key constraint) the index creationfails, but the catalog entry remains.
Many issues resolved● CREATE INDEX
– including CONCURRENTLY● Re-using dead tuples● Interaction with Cluster● Plan invalidation● Utilities & tools
But still not reviewed
Tom Lane says:“break it up, please!”
● Too big a patch for reviewers– almost 12,000 lines
● Broken up into 5 parts– 1. The basic HOT implementation– 2. Retain vacuum, chain pruning and other tricks– 3. Fix the broken VACUUM and VACUUM FULL
code– 4. Fix the broken CREATE INDEX– pg_stats and other misc. utilities
Code Reviewed
PostgreSQL Beta
HOT Performance
SKYLINE OF
SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF], .., dm [MIN | MAX | DIFF]
SELECT *
FROM books
SKYLINE OF rating MAX, price MIN;
CDE @ IIIT, Hyderabad
Feature proposed 3/3
Extension to SQL syntax
SKYLINE OF [DISTINCT] d1 [MIN | MAX | DIFF], .., dm [MIN | MAX | DIFF]
Approximate Queries
Approximate QueriesSELECT *
FROM Books
SKYLINE OF rating MAX, price MIN;
Lots of discussion
Problems with the PatchRe: PostgreSQL - 'SKYLINE OF' clause added!
From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz>Subject: Re: PostgreSQL - 'SKYLINE OF' clause added!Date: Thu, 08 Mar 2007 01:12:22 -0500
Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes:> Tom Lane wrote:>> Well, whether it's horrible or not is in the eye of the beholder, but>> this is certainly a non-standard syntax extension.
> Being non-standard should not be the only reason to reject a worthwhile > feature.
No, but being non-standard is certainly an indicator that the featuremay not be of widespread interest --- if it were, the SQL committeewould've gotten around to including it; seems they've managed to includeeverything but the kitchen sink already. Add to that the complete lackof any previous demand for the feature, and you have to wonder where themarket is.
> The fact that several > different groups have been mentioned to be working on this feature would > indicate that it is worth considering.
Problems with the Patch● Not part of the ANSI SQL standard
– possibly low general applicability– might get added to standard with different syntax– might never get standardized at all
● Requires changes to PostgreSQL parser– new keyword break applications– possible side effects
● Not coded to PostgreSQL standards– would need refactoring
Rejected!Re: PostgreSQL - 'SKYLINE OF' clause rejected
From: Tom Lane <tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot ) us>To: Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz>Subject: Re: PostgreSQL - 'SKYLINE OF' clause added!Date: Sun, 11 Mar 2007 23:44:41 -0400
Shane Ambler <pgsql ( at ) Sheeky ( dot ) Biz> writes:> If we consider this thoroughly and compile a suitable syntax that covers > all bases it could be used as the basis of the standard definition or be > close to what ends up in the standard.
I'll bet you a very good dinner that the word SKYLINE will never be seenin the standard.
To me, the proposed feature seems an extremely narrow, special-purposething. The SQL committee have never been into that very much, and seemeven less interested in the last couple of revisions. They likemechanisms that can be used to solve a wide variety of problems, andare not afraid to introduce conceptual complexity to get there.Two examples for you: outer joins and recursive queries. Oracle's(+) syntax is more compact than what got into the spec, but lessprecise and less functional. For recursive queries, CONNECT BY isway simpler than what got into the spec, but again doesn't cover asmuch ground. The SKYLINE clause seems to me to be right about onpar with CONNECT BY ... it does something useful, but only one thing.
Solution: pgFoundry
Contributor Resources
Mailing Lists● Hackers list
– pgsql-hackers– main list for development discussion
● Patch list– pgsql-patches– submit your patch here after discussion on -hackers
● Specific feature lists– pgsql-jdbc, pgsql-performance, pgsql-sql, etc.– subscribe at www.postgresql.org/community/lists
Web Sites● www.postgresql.org
– main site● www.pgfoundry.org
– add-ins, drivers, tools● developer.postgresql.org
– developer wiki, including TODO lists● archives.postgresql.org
– mailing list archives -- search for your idea here
Documentation● www.postgresql.org/docs
– main documentation– internals:/docs/current/static/internals.html– code conventions: /docs/current/static/source.html
● doxygen.postgresql.org– annotated source code
● www.postgresql.org/docs/faqs.FAQ_DEV.html– developer FAQ
The PostgreSQL YearRC and Branch
Feature Freeze
Beta
RC and Branch
Development Period
Integration & Review(1 month)
Beta Testing(1-2 months)
December 2007
February 1, 2008
August 1, 2008
September, 2008
October, 2008
Patch Commit Fest
Development Period
April 1, 2008Patch Commit Fest
Development Period
June 1, 2008Patch Commit Fest
Development Period
Other tips on submitting● Don't get discouraged.
– Be prepared to argue.– One hacker rejecting your idea doesn't mean
everyone does.– Committers (esp. Tom Lane) are often more
concerned about maintainability than cool stuff.● Be flexible: you will have to make changes.
– Corporate and academic coding standards are generally lower than the project's.
Other tips on submitting● Don't use the wrong arguments
– “MySQL/Oracle does it this way.”– “Based on this hot academic trend.”
● Some things make a patch harder to accept– New syntax– Backwards compatibility issues– High code counts
● Don't get discouraged.
Now, go write some code.
or contribute in some easier way
Contact Information● Josh Berkus
– josh@postgresql.org– blogs.ittoolbox.com/
database/soup– www.sun.com/postgresql
● PostgreSQL India– in@postgresql.org
● Pavan Deolasee– pavan.deolasee
@enterprisedb.com– www.enterprisedb.com
This talk is copyright 2007 Josh Berkus, and is licensed under the creative commons attribution license
top related