postgres-xc: symmetric postgresql cluster
DESCRIPTION
This presentation explains Postgres-XC architecture and its use case. It also explains the HA aspects of Postgres-XC and benchmarking results.TRANSCRIPT
![Page 1: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/1.jpg)
Postgres-XCSymmetric PostgreSQL Cluster
Pavan DeolaseeJuly 29, 2013
PostgreSQL Global Development GroupPostgres-XC Global Development Group
![Page 2: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/2.jpg)
11/21/13 Postgres-XC 2
Who am I ?
Pavan Deolasee� Http://www.linkedin.com/in/pavandeolasee
Contributor to PostgreSQL and Postgres-XC Global Development� Best known for development of Heap-Only-Tuple (HOT) feature of PostgreSQL
8.3 release
� Contributed several other enhancements to PostgreSQL and derivatives
� Associated with Postgres-XC since its inception
� Contributed to Postgres-XC's architectural design and implementation of several features
Previously worked for EnterpriseDB and Symantec/Veritas
![Page 3: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/3.jpg)
11/21/13 Postgres-XC 3
Agenda
PostgreSQL
History of Postgres-XC
Architecture Overview
Data Distribution Strategies
Scalability and Performance Results
SPOF Analysis
Development Practices, Project Status and Roadmap
Support
![Page 4: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/4.jpg)
PostgreSQL
![Page 5: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/5.jpg)
11/21/13 Postgres-XC 5
PostgreSQL Recap
PostgreSQL� is world's most advanced open source database
� is very stable
� is fully compliant with ANSI SQL
� supports foreign key, check constraints
� supports various kinds of indexes
� supports inheritance
� is fully extensible (data types, procedural languages etc)
� supports ACID transactions
� will recover your database in case of server failure
![Page 6: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/6.jpg)
11/21/13 Postgres-XC 6
PostgreSQL Recap
PostgreSQL� uses write-ahead-logs for durability and recover your database in case of server
failure
� built-in log based streaming synchronous/asynchronous replication
� file system level backups and archive recovery
� point-in-time recovery
� hot standby
� binary upgrades
� full-text search
� and many more
![Page 7: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/7.jpg)
11/21/13 Postgres-XC 7
Performance and Scalability
While PostgreSQL was becoming feature rich, scalability was still a concern, especially on high end machines.
Significant work in last few releases to address performance and scalability issues
� Heap-only-tuples (HOT) in 8.3 for reuse of dead space
� Virtual transaction identifiers for read-only transactions
� Spread checkpoints to reduce spikes in disk IO
� Hot standby for read scalability
� Visibility maps, Index-only-scans
� Improved spin-lock implementations
� Linear read scalability for multi-core machines (64 cores)
� Unlogged tables
![Page 8: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/8.jpg)
11/21/13 Postgres-XC 8
Performance and Scalability
Scale up is a continuous process� Every major release comes up with new improvements
� Reduced contention for critical sections is important for multiprocess scalability
64 and 128 core machines are becoming common Several hundred concurrent connections is a norm
� Efficient and full use of resources on high end machines
Large amounts of RAM
While scale up is nice and very important, it may not be able to keep pace with the volume and growth of data
Streaming replication and Hot standby allows read scalability, but each server should still be capable of handling full data
Partitioning using inheritance and constraint exclusion
![Page 9: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/9.jpg)
11/21/13 Postgres-XC 9
Performance and Scalability
When data grows rapidly, it may not be possible or affordable to upgrade hardware quickly
Scale out is a very attractive solution for BIG data use cases
PostgreSQL had some support as contrib module� DBlink
Foreign Data Wrappers (FDW)� Create foreign servers and foreign tables
� Natively query those tables from PostgreSQL server
� FDWs exist for Postgres, MySQL, File, MongoDB etc
� Though attractive, still in very early stage and may not scale with complex queries
![Page 10: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/10.jpg)
History of Postgres-XC
![Page 11: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/11.jpg)
11/21/13 Postgres-XC 11
Project Kick-off
2009 - NTT Data Intellilink Corporation, Japan and EnterpriseDB (my previous employer) decide to collaborate for developing a clustering solution for PostgreSQL
� NTT had previously internally tried to build another Postgres cluster (RITA-DB), but could not take it beyond prototype stage. Lessons learnt
� EnterpriseDB had previously acquired GridSQL, a scale-out solution. But it had its own issues including performance problems
� No other existing clustering solution was good enough
Postgres-R Slony-II Pgpool-II GridSQL Log Shipping Standby pl/proxy
![Page 12: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/12.jpg)
11/21/13 Postgres-XC 12
Project Mandate
Provide a viable open source, free alternative to a popular shared disk cluster from a very reputable commercial vendor. Yes, Oracle RAC
Provide write-scalability� At least 3.5 times scalability with 5 servers in a write intensive workload
Provide read-scalability
� Efficient utilization of cluster nodes
� Parallel execution
![Page 13: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/13.jpg)
11/21/13 Postgres-XC 13
Project Mandate (Cont)
Provide global consistency� Data written from one node must be available from other nodes immediately and
in a consistent manner
Provide ACID properties of transaction
Provide seamless API so that the client applications need no change or very minor change
� Same SQL
� Same libpq, JDBC, ODBC APIs
Must run and scale on commodity hardware
� Expensive shared storage
� Expensive network interconnects
![Page 14: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/14.jpg)
11/21/13 Postgres-XC 14
Initial Team
Koichi Suzuki� Chief architect and leader of the project, NTT Data Intellilink
Mason Sharp� StormDB (previously at EnterpriseDB)
Pavan Deolasee (me)
� (previously at EnterpriseDB)
Many more from NTT, EnterpriseDB and open source community has joined and contributed since then
(Part of the content is attributed to this team and the global development team)
![Page 15: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/15.jpg)
Postgres-XC Architecture
![Page 16: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/16.jpg)
11/21/13 Postgres-XC 16
So What is Postgres-XC ?
Shared nothing architecture
Symmetric PostgreSQL cluster� No master/slave replication
� No read-only clusters
� Every node can issue both read/write
� Every node provides single consistent database view
� Transparent transaction management
Not just a replication� Each table can be replicated/distributed by sharding
� Parallel transaction/query execution
So both read/write scalability
![Page 17: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/17.jpg)
11/21/13 Postgres-XC 17
PostgreSQL Streaming Replication
Master-slave architecture
Write-Ahead-Log (WAL) based replication
Synchronous, asynchronous, cascading replication supported
Hot Standby� Can answer read-only queries
� Standby may give a stale view of the database since the WAL application happens asynchronously
� Do not support table level replication
![Page 18: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/18.jpg)
11/21/13 Postgres-XC 18
Postgres-XC: Symmetric Cluster
Global TransactionManager (GTM)
Postgres-XC Servers
![Page 19: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/19.jpg)
11/21/13 Postgres-XC 19
Postgres-XC Key Points
Postgres-XC is symmetric, shared nothing, scalable PostgreSQL cluster
Write Scalability
Read Scalability
Global Consistency
![Page 20: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/20.jpg)
11/21/13 Postgres-XC 20
Postgres-XC Major Components
Applications
Coordinators
Datanodes
GTM
SQL/libpq Interface
SQL / libpq / ODBC / JDBC Interface
![Page 21: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/21.jpg)
11/21/13 Postgres-XC 21
PostgreSQL MVCC
PostgreSQL uses Multi-version concurrency control (MVCC) for guaranteeing Isolation and Consistency (“CI” of ACID)
MVCC increases concurrency � readers don't wait on other readers/writers
� writers don't wait on readers
Every write transaction is assigned an XID (32-bit identifier)
Every row that gets inserted/deleted/updated gets a stamp of the transaction that did the operation
A MVCC Snapshot is used to decide if a given row or a version of a given row is visible to a transaction or not
![Page 22: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/22.jpg)
11/21/13 Postgres-XC 22
PostgreSQL MVCC
Two important concepts
� Transaction Identifiers (XIDs)
� MVCC Snapshots
A consistent view of the database can be obtained by ensuring that these two values are managed properly
![Page 23: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/23.jpg)
11/21/13 Postgres-XC 23
Global Transaction Manager (GTM)
The only central component in the XC cluster
Provides globally consistent and visible transaction identifiers (GXID) to transactions running across the cluster
Provides global snapshots to run queries across the cluster so that they see a consistent view irrespective of how many different nodes the query need to fetch data from
GTM is not a database itself, though it keeps some simple state information
Its a separate binary supplied with Postgres-XC source
Runs as a independent server process
![Page 24: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/24.jpg)
11/21/13 Postgres-XC 24
Global Transaction Manager (GTM)
There can only be ONE GTM per Postgres-XC cluster, though additional GTM standby can be configured (more later)
Comes with a GTM-proxy (more later) for request pooling and scalability
Multi-threaded process (one thread per connection)
Recommended to run GTM on a separate physical machine for performance and scalability
Helps in a few other side issues such as to manage sequences
![Page 25: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/25.jpg)
11/21/13 Postgres-XC 25
Coordinator
Coordinator is the brain of Postgres-XC cluster
Applications connect to the coordinators and issue SQL queries (they never talk to datanodes directly)
� Parse, plan and execute queries
� Knows about table distribution properties and takes advantage of the same for query planning
� Pushes down as much computation as possible to the datanodes
� Pushes down transaction identifiers and snapshots to other nodes
� Manages implicit two-phase transactions
![Page 26: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/26.jpg)
11/21/13 Postgres-XC 26
Coordinator
Non-critical component of the Postgres-XC cluster
There could be (and most likely there will be) more than one coordinator in the cluster
Applications can connect to any coordinator and issue read/write queries
� No master coordinator
Modified version of PostgreSQL� Remember PostgreSQL comes with PostgreSQL license
![Page 27: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/27.jpg)
11/21/13 Postgres-XC 27
Datanode
The real guy managing all the data
Accepts requests from coordinators and sends responses back� SQL/DML queries
� DDL queries
� Maintenance commands (VACUUM, CHECKPOINT, CLUSTER etc)
Modified PostgreSQL engine but with very minimal changes
� Transaction management, MVCC
� Sequences
Critical component of the cluster� Failure will lead to DML/DDL queries failing unless the coordinator can decide to
bypass the datanode because its not required for the given query
![Page 28: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/28.jpg)
Data Distribution
![Page 29: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/29.jpg)
11/21/13 Postgres-XC 29
Distribution Strategies
Distributed tables� Each row exists only on a single datanode
� Distribution strategies
HASH MODULO ROUND ROBIN User defined functions (Future) Range (Future) Values (Future)
Replicated tables� Each row of the table is stored on all datanodes where the table is replicated
![Page 30: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/30.jpg)
11/21/13 Postgres-XC 30
Distributed Tables
Write to a single row is applied only on the node where the row resides
� Multiple rows can be written in parallel
� Scanning rows spanning across the nodes (e.g. table scans) can hamper performance
Point reads and writes based on the distribution column value show good performance
Datanode where the operation happens can be identified by the distribution column value in the query
![Page 31: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/31.jpg)
11/21/13 Postgres-XC 31
Replicated Tables
Statement level replication
� Each write needs to be replicated
� Writes are costly
Read can happen on any node (where table is replicated)� Reads from different coordinators can be routed to different nodes
Useful for relatively static tables, with high read load
![Page 32: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/32.jpg)
11/21/13 Postgres-XC 32
Distribution Samples – Horizontal Scalability
CREATE TABLE emp (empno integer, ename text) DISTRIBUTE BY HASH(empno);
CREATE TABLE emp (empno integer, ename text) DISTRIBUTE BY MODULO(empno);
CREATE TABLE emp (empno integer, ename text) DISTRIBUTE BY ROUND ROBIN;
CREATE TABLE emp (empno integer, ename text);
Default is to distribute by HASH if the datatype is supported
� first column of primary key or� first column of foreign key or� first column of the table� If everything else fails, distribute by ROUND ROBIN
![Page 33: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/33.jpg)
11/21/13 Postgres-XC 33
Distribution Samples (Cont)
CREATE TABLE emp (empno integer, ename text) DISTRIBUTE BY HASH(empno)
TO NODE (N1, N2);
Distribute to all nodes or a set of nodes, either explicitly named or grouped together as named group
Default is to distribute to all the nodes configured
![Page 34: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/34.jpg)
11/21/13 Postgres-XC 34
Distributed Query Processing
Coordinator
� Accepts queries and plans them
� Finds the right data-nodes from where to fetch the data
� Frames queries to be sent to these data-nodes
� Gathers data from data-nodes
� Processes it to get the desired result
Datanode� Executes queries from coordinator like PostgreSQL
� Has same capabilities as PostgreSQL
![Page 35: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/35.jpg)
11/21/13 Postgres-XC 35
Distributed Query Processing
Query shipping
Coordinator tries to delegate maximum query processing to data-nodes
� Indexes are located on datanodes
� Materialization of huge results is avoided in case of sorting, aggregation, grouping, JOINs etc.
� Limit, Offset clauses are shipped to the datanodes
� ORDER BY, GROUP BY clauses are shipped to the datanodes
� Coordinator is freed to handle large number of connections
� Joins are shipped too
Distributing data wisely helps coordinator to delegate maximum query processing and improve performance
![Page 36: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/36.jpg)
Scalability by Data Distribution
![Page 37: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/37.jpg)
11/21/13 Postgres-XC 37
Write Scalability
Tables are distributed/sharded – usually HASH of the column value
� Horizontal partitioning
Inserts are directed to the datanode based on the value of the distributed column
� Integrated in the query executor
� Not trigger or rule based
� Supports COPY (bulk load mechanism of PostgreSQL)
Updates and Deletes also directed to the datanodes� Coordinator is completely bypassed if the query does not have any local/cross
dependency
� Highly scalable in many cases
Some queries must go through the coordinator because of dependencies
![Page 38: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/38.jpg)
11/21/13 Postgres-XC 38
Write Scalability (Cont)
CREATE TABLE test (a int, b text) DISTRIBUTE BY HASH(a);
INSERT INTO test SELECT generate_series(1,100), 'foo';
EXPLAIN VERBOSE UPDATE test SET b = 'bar' WHERE a > 1;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: ('bar'::text), ('bar'::text), test.ctid, xc_node_id
Node/s: node_dn1, node_dn2
Remote query: UPDATE public.test SET b = 'bar'::text WHERE (a > 1)
(4 rows)
![Page 39: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/39.jpg)
11/21/13 Postgres-XC 39
Write Scalability (Cont)
EXPLAIN VERBOSE DELETE FROM test WHERE a = 100;
QUERY PLAN
----------------------------------------------------------------------------
Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0)
Output: test.a, test.ctid, xc_node_id
Node/s: node_dn2
Remote query: DELETE FROM public.test WHERE (a = 100)
(4 rows)
Queries are started in parallel on multiple nodes thus improving scalability of write operations
![Page 40: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/40.jpg)
11/21/13 Postgres-XC 40
Read Scalability
Significant planner enhancements to improve read scalability� Many cool features already in
� Continuous development for improving it further
Query is analyzed and shipped to a datanode as it is if there are no dependencies on the coordinator or other datanodes
� Fast Query Shipping
If entire query can not be shipped, parts of the query are shipped� Offload as much processing as possible to the datanodes
� Do the final processing of data on the coordinator
� Reduce as much data movement from datanodes to the coordinator
![Page 41: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/41.jpg)
11/21/13 Postgres-XC 41
Read Scalability (Cont)
Parallel query execution
� Query is stared in parallel on all involved datanodes
� Significantly improves performance by utilizing capacities across the cluster
Qualifications push-down� WHERE clauses in a SELECT query are pushed down to the datanodes to
reduce the selectivity of the result set
� Significantly reduces the amount of data fetched at the coordinator for final execution
Remote expressions push-down� Push function and expression evaluation to the datanodes
![Page 42: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/42.jpg)
11/21/13 Postgres-XC 42
Read Scalability (Cont)
Aggregates� Enhanced PostgreSQL's two step aggregation to three step aggregation
� Collect summary at each datanode and then do the final aggregation at the coordinator
An example, avg(integer)
Datanode D1 - 1, 2, 5, 7
Datanode D2 - 3, 4, 6, 8, 9
Datanode D1 sends: sum(integer), count => 15, 4
Datanode D2 sends: sum(integer), count => 30, 5
Coordinator computes the final avg as (15 + 30) / (4 + 5) = 5
![Page 43: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/43.jpg)
11/21/13 Postgres-XC 43
Read Scalability (Cont)
Join Push-down
� Complete or part of complex joins can be pushed down to the datanodes
Complex analysis of the tables involved in the join and their distribution strategy is required to arrive at a conclusion regarding what can be pushed and what must not be pushed
Significant performance and scalability boost
Order By� Result sets are sorted on individual datanodes and fetched on the coordinator
� Coordinator does a final merge step of the sorted sets
Limit/Offset
![Page 44: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/44.jpg)
11/21/13 Postgres-XC 44
Thumb Rules - Read-Write Queries
High point reads (based on distribution column)
� Distributed or replicated
High read activities but no frequent writes� Better be replicated
High point writes� Better be distributed
High insert-load, but no frequent update/delete/read� Better be round-robin
![Page 45: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/45.jpg)
11/21/13 Postgres-XC 45
Query Analysis
Find the relations/columns participating in equi-join conditions, WHERE clause etc.
� Distribute on those columns
Find columns participating in GROUP BY, DISTINCT clauses
� Distribute on those columns
Find columns/tables which are part of primary key and foreign key constraints
� Global constraints are not yet supported in XC
� Distribute on those columns
![Page 46: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/46.jpg)
11/21/13 Postgres-XC 46
Example DBT1
![Page 47: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/47.jpg)
11/21/13 Postgres-XC 47
Example DBT1
author, item� Less frequently written
� Frequently read from
� Author and item are frequently JOINed
Dimension tables� Hence replicated on all nodes
![Page 48: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/48.jpg)
11/21/13 Postgres-XC 48
Example DBT1
customer, address, orders, order_line, cc_xacts
� Frequently written
hence distributed� Participate in JOINs amongst each other with customer_id as JOIN key
� point SELECTs based on customer_id
hence distributed by hash on customer_id so that JOINs are shippable� Participate in JOINs with item
Having item replicated helps pushing JOINs to datanodes
![Page 49: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/49.jpg)
11/21/13 Postgres-XC 49
Example DBT1
Shopping_cart, shopping_cart_line
� Frequently written
Hence distributed� Point selects based on column shopping_cart_id
Hence distributed by hash on shopping_cart_id� JOINs with item
Having item replicated helps
![Page 50: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/50.jpg)
11/21/13 Postgres-XC 50
Results – DBT1
![Page 51: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/51.jpg)
Other Scalability Considerations
![Page 52: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/52.jpg)
11/21/13 Postgres-XC 52
GTM Scalability
GTM can become a bottleneck
� serve XID for every new transaction in the cluster
� MVCC snapshot for transaction or every statement in a transaction (read-committed transactions)
� Snapshot size can increases proportional to the number of concurrent transactions in the system
GTM has been heavily optimized and runs a multi-threaded process for scalability
For even greater scalability, advised to run GTM Proxy on every coordinator
![Page 53: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/53.jpg)
11/21/13 Postgres-XC 53
GTM Scalability – GTM Proxy
Runs as a proxy to GTM on other servers – preferably on the same server where coordinators run
� One proxy per coordinator server
Same API as GTM� Components can connect to a proxy or GTM directly
Pools multiple GET TXID requests into a single request
� e.g get 100 new identifiers in a single request/response
Pools multiple GET SNAPSHOT requests into a single request� Some manipulation of the received snapshot is required, but that's possible
Further pools requests of multiple types� Reduces load on the GTM significantly
� One proxy per server scales near linearly
![Page 54: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/54.jpg)
11/21/13 Postgres-XC 54
Coordinator Scalability
Coordinator does not store table data
� IO scalability not an issue
Query parsing, planning and part or full execution� CPU utilization could be high
Add more coordinators if you need more computing power
Beware of connections explosion on the datanodes� Every additional coordinator will need more connections to datanode
� Built-in connection pooler in the coordinator can help a lot
![Page 55: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/55.jpg)
11/21/13 Postgres-XC 55
Datanode Scalability
Datanode stores and fetches table data from the disks
In case of IO saturation� Add more datanodes and reconfigure existing tables
Easier said than done May need data redistribution (except for round-robin distribution) Table remains locked out (until concurrent data redistribution is added)
� Add more datanodes for new tables
Create new tables on the new datanode Existing tables are not impacted
![Page 56: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/56.jpg)
11/21/13 Postgres-XC 56
Transaction Management
Postgres-XC uses two-phase commit protocol for distributed transaction management
� Every commit will make two round-trips to the datanode (PREPARE and COMMIT)
� Optimized so that single phase commit is used if only one node is involved in a transaction
� Optimized so that only write-operating datanodes/coordinators are counted for deciding if 2PC is required or not
If possible, design transactions so that they span a single or as few nodes as possible
![Page 57: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/57.jpg)
SFOF Analysis and Failure Handling
![Page 58: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/58.jpg)
11/21/13 Postgres-XC 58
Single Point of Failure
GTM
� Obviously SPOF
GTM-Proxy� No persistent data
� Just restart when fails
Coordinator
� Every coordinator is essentially a copy
� When fails, other coordinators work
Datanode� SPOF for sharded table
![Page 59: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/59.jpg)
11/21/13 Postgres-XC 59
SPOF - GTM
GTM keeps crucial information about next transaction identifier, currently running transactions as well as sequence information
� Actual fate of each transaction is not permanently saved on the GTM though (at least not today)
GTM failure leads to complete failure of the cluster� any useful work would needs GTM interaction
GTM Standby for rescue
![Page 60: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/60.jpg)
11/21/13 Postgres-XC 60
GTM Standby
GTM Standby is a working replica of the GTM
� Every operation on the GTM is replicated on the GTM, either via proxy or GTM
Create and configure standby� Only one standby can be configured currently
� Can be added to a running cluster without any disruption
� Can be promoted to be the new master
� Same executable as GTM, only configuration change
![Page 61: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/61.jpg)
11/21/13 Postgres-XC 61
SPOF - Datanode
Almost all the techniques for PostgreSQL HA are available
� Streaming replication
� Shared disk remount
Coordinators may require reconfiguration� Coordinators should use the new datanode
� Coordinators should clean connections to failed datanode before reconfiguration
GTM� Reconnect to (new) local GTM proxy
![Page 62: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/62.jpg)
11/21/13 Postgres-XC 62
SPOF – Coordinator
Coordinator only maintains catalog information
� Stable and mostly read-only
All coordinators are replica of each other� No master-slave concept. Every coordinator is equal
Datanode HA techniques can be applied� Streaming replication
� Shared disk remount
Applications can continue to use other coordinators while the failed coordinator is restored
![Page 63: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/63.jpg)
Development Process and Status
![Page 64: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/64.jpg)
11/21/13 Postgres-XC 64
Development Process
Project hosted on sourceforge
� http://postgres-xc.sourceforge.net/
Join developer/user mailing lists� [email protected]
Test and report bugs
Submit patches
![Page 65: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/65.jpg)
11/21/13 Postgres-XC 65
Regression and Performance Tests
PostgreSQL's regression tests are enhanced
� New tests added to test Postgres-XC specific features
� Existing tests modified to adjust for difference because of cluster environment
Nightly regression tests are run on buildfarm (at least last I checked)
Regular performance tests are conducted by NTT Data to guard against performance regression
� Typically DBT1 tests
� Last I heard, DBT2 tests were being developed by NTT
![Page 66: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/66.jpg)
11/21/13 Postgres-XC 66
Status
Release 1.1 is in beta stage (recommended)� Based on latest PostgreSQL 9.2 stable release
� Many new features, enhancements and bug fixes
Data redistribution Triggers Planner enhancements GTM standby Where current of Returning
Release 1.0 is the latest stable release� Based on PostgreSQL 9.1
![Page 67: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/67.jpg)
11/21/13 Postgres-XC 67
Release Management
Yearly major release
� Merge with PostgreSQL's new release
� Additional XC features
Quarterly minor release� Bug fixes
� Security fixes
� Catch up with PostgreSQL's minor release
![Page 68: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/68.jpg)
11/21/13 Postgres-XC 68
Licensing
Started with GPL license, but changed to PostgreSQL license couple of years back
� Anyone can take the source, modify and use without requiring to open source the modified code
� Very flexible licensing
![Page 69: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/69.jpg)
11/21/13 Postgres-XC 69
Support
Very good support available from the core team on the mailing lists
� Helps if a self containing test case is included
Encouraged to fix bugs on your own and submit a patch
A committer will look at the patch and commit (with changes if required)
Volume of bug reports is still low and hence good turnaround time
A few commercial companies have started offering professional support
� Good for project in the long run
� Skills of support staff could be limited right now
![Page 70: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/70.jpg)
Limitations and Roadmap
![Page 71: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/71.jpg)
11/21/13 Postgres-XC 71
Current Limitations
Multi-column distribution is not supported
Global constraints not supported� Unique index on non-distribution column is not supported
Updating distribution column not supported
Statement level triggers not supported on the datanode side� Statement level triggers will fire at the coordinator
![Page 72: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/72.jpg)
11/21/13 Postgres-XC 72
Current Limitations
No cluster wide deadlock detection
� Timeout based deadlock resolution
No cost estimation at coordinator (table statistics not available/used at the coordinator)
� ANALYZE command is propagated to the datanodes and the stats are updated on the datanodes
� Datanodes have up-to-date statistics. So they will plan the queries in the most efficient way
Node management needs much improvement� No built-in HA for replicated tables SELECT queries
![Page 73: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/73.jpg)
11/21/13 Postgres-XC 73
Current Limitations
Truly serializable transactions not supported
� Read-committed and repeatable reads are supported
Transactions accessing TEMP objects can not be committed with 2PC
� PostgreSQL does not support PREPARing a transaction that has accessed temporary objects
� Set enforce_two_phase_commit = off to handle such transactions
Savepoints not supported
Exception blocks in plpgsql not supported (they need savepoint support)
![Page 74: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/74.jpg)
11/21/13 Postgres-XC 74
Roadmap
Support for remaining SQL features (continuous process)
Planner enhancements so that more and complex queries can be pushed down to the datanodes
Table statistics at the coordinator
Concurrent redistribution of data after node additional/removal� Without exclusive lock on the table
Separate group in NTT is developing resource agents for corosync/pacemaker
� Plans not yet in public domain
Installers and further testing with DBT-2/DBT-5
![Page 75: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/75.jpg)
Resources
![Page 76: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/76.jpg)
11/21/13 Postgres-XC 76
Project Resources
Postgres-XC project home page
� http://postgres-xc.sourceforge.net/
� Project wiki and links to other presentations/white papers
Downloads and installation� Product RPMS available for download
Source code
� Source code from GIT repository
� Zipped tar balls
Other tools� pgxc_ctl utility to configure/manage/monitor the cluster
� A bunch of other contrib modules
![Page 77: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/77.jpg)
11/21/13 Postgres-XC 77
Project Resources
Documentation� Extensive Postgres style documentation explaining XC-specific features
� Available online as well as distributed with source/rpms.
Mailing lists� Postgres-xc-general
� Postgres-xc-developers
� Postgres-xc-bugs
Write to me� [email protected]
![Page 78: Postgres-XC: Symmetric PostgreSQL Cluster](https://reader034.vdocuments.us/reader034/viewer/2022042613/554a1596b4c9058c5d8b4e4e/html5/thumbnails/78.jpg)
Thank You