hacking postgresql internals - citus...

62
Treasure Data, Inc. Founder & Software Architect Sadayuki Furuhashi Hacking PostgreSQL Internals to Solve Data Access Problems

Upload: vuongtu

Post on 03-Feb-2018

245 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Treasure Data, Inc. Founder & Software Architect

Sadayuki Furuhashi

Hacking PostgreSQL Internals to Solve Data Access Problems

Page 2: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

A little about me...

> Sadayuki Furuhashi > github/twitter: @frsyuki

> Treasure Data, Inc. > Founder & Software Architect

> Open source hacker

Page 3: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Open source

Fluentd - Unifid log collection infrastracture

Embulk - Plugin-based parallel ETL

MessagePack - Schemaless serialization format

Page 4: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

End-to-end data analytics pipeline

on the cloud.

Page 5: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Motivation of Prestogres

HDFS

Hive

PostgreSQL, etc.

Daily/Hourly BatchInteractive query

Dashboard

> I want to build an open-source ODBC connectivity directly to a big data analytics infrastracture.

Page 6: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

HDFS

Hive

PostgreSQL, etc.

Daily/Hourly BatchInteractive query

CommercialBI Tools

Batch analysis platform Visualization platform

Dashboard

Page 7: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

HDFS

Hive

PostgreSQL, etc.

Daily/Hourly BatchInteractive query

✓ Scalable to PBs of data ✓ But too slow for

interactive queries

CommercialBI Tools

Dashboard

✓ Extra work to manage 2 systems

✓ Can’t query against “live” data directly

Batch analysis platform Visualization platform

Page 8: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

HDFS

Hive Dashboard

Presto

PostgreSQL, etc.

Daily/Hourly Batch

HDFS

HiveDashboard

Daily/Hourly Batch

Interactive query

Direct interactive query!!

Unified data analysis platform

Page 9: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Presto

HDFS

HiveDashboard

Daily/Hourly BatchInteractive query

Cassandra PostgreSQL Kafka, etc.

SQL on any data sets

Unified data analysis platform

Page 10: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Presto

HDFS

HiveDashboard

Daily/Hourly BatchInteractive query

Cassandra PostgreSQL Kafka, etc.

SQL on any data sets

Unified data analysis platform

CommercialBI Tools

✓ IBM Cognos ✓ Tableau ✓ ...

ODBC

Page 11: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Presto

HiveDashboard

Daily/Hourly BatchInteractive query

Cassandra PostgreSQL Kafka, etc.

SQL on any data sets CommercialBI Tools

✓ IBM Cognos ✓ Tableau ✓ ...

Prestogres

Unified data analysis platform

HDFS

Page 12: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Presto

Dashboard

Interactive query

CommercialBI Tools

✓ IBM Cognos ✓ Tableau ✓ ...

Prestogres

Today’s topic

HDFS

Page 13: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Q. Why do you choose Presto over other databases?

Page 14: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

1. Why Presto? - Presto’s architecture

Page 15: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

Page 16: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service1. find servers in a cluster

Page 17: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

2. Client sends a query using HTTP

Page 18: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

3. Coordinator builds a query plan

Connector plugin provides metadata (table schema, etc.)

Page 19: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

4. Coordinator sends tasks to workers

Page 20: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

5. Workers read data through connector plugin

Page 21: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

6. Workers run tasks in memory

Page 22: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Coordinator ConnectorPlugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

7. Client gets the result from a worker

Client

Page 23: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator Connector

Plugin

Worker

Worker

Worker

Storage / Metadata

Discovery Service

Page 24: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

What’s Connectors?

> Connectors are plugins of Presto > Connectors provide metadata and data to Presto

> provide table schema to coordinators > provide table rows to workers

> Implementations: > Hive connector > Cassandra connector > JDBC connector (scans from RDBMS) > Kafka connector, etc.

Page 25: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

ClientCoordinator

otherconnectors

...

Worker

Worker

Worker

PostgreSQL

Discovery Service

find servers in a cluster

Hive Connector

HDFS / Metastore

Multiple connectors in a query

JDBC Connector

Other data sources...

Page 26: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Postgres-XL vs. Presto

Postgres-XL Presto

worker worker

worker worker

worker

worker

Data is stored on separated storage systems

worker

coordinator

storage storage

coordinator

Data is stored on workers

worker

Transaction Manager

worker worker

Supports ACIDwith MVCC

Doesn't support transactions

Page 27: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Q. Why do you choose Presto over other databases?

> A. Because Presto is elastic. > Computation performance is isolated from

storage management. • Adding a server improves performance instantly.

(No data re-distribution when we add a server) • Removing server is also done instantly.

> That's good for cloud-based infrastracture. • Scale performance when we need. • JOIN across multiple data sources (RDB, S3, etc.)

without moving big data.

scan scan

join join

aggr

aggr

aggr

Distributed IO ondistributed storage

Page 28: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

All stages are pipe-lined ✓ No wait time ✓ No fault-tolerance

Hadoop MapReduce vs. Presto

MapReduce Presto

map map

reduce reduce

task task

task task

task

task

memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory

task

disk

map map

reduce reduce

disk

disk

Write datato disk

Wait betweenstages

Page 29: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Today’s talk

0. Overview of Presto & data analytics platform

1. Why Presto? - Presto's architecture

2. Prestogres design

3. Prestogres implementation

4. Prestogres hacks

5. Presto internals

Page 30: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

2. Prestogres designPostgreSQL protocol gateway

Page 31: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

> BI tools need ODBC or JDBC connectivity. > Tableau, IBM Cognos, QlickView, Chart.IO, ... > JasperSoft, Pentaho, MotionBoard, ...

> ODBC/JDBC is VERY COMPLICATED. • psqlODBC: 58,000 lines • postgresql-jdbc: 62,000 lines • mysql-connctor-odbc: 27,000 lines • mysql-connector-j: 101,000 lines

The problems to solve

> Open-source implementation will takelong time.

Page 32: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

A solution

> Creates a PostgreSQL protocol gateway server > Reuses PostgreSQL’s stable ODBC / JDBC driver

Page 33: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

dashboard on chart.io: https://chartio.com/

Page 34: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Other possible designs were…

a) MySQL protocol + libdrizzle: > Drizzle includes a well-designed library to implement

MySQL protocol server. > Proof-of-concept worked well:

• trd-gateway - MySQL protocol gateway server for "Hive"

> Difficulties: clients assumes the server is MySQL but, • syntax is not ANSI standard: MySQL uses `…`, while Presto uses “…” • function mismatches: DAYOFMONTH(…) vs EXTRACT(day…)

Page 35: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Other possible designs were…

b) PostgreSQL + Foreign Data Wrapper (FDW): > JOIN and aggregation pushdown is not available (yet?)

Page 36: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Difficulties to implement PG protocol

> Emulating system catalogs > pg_class, pg_namespace, pg_proc, …

> Rewriting transactions (BEGIN, COMMIT) > Presto doesn’t support transactions

Page 37: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Prestogres design

pgpool-II + PostgreSQL + PL/Python > pgpool-II is a PostgreSQL protocol middleware for

replication, failover, load-balancing, etc. > pgpool-II already implements useful utility functions

(parsing SQL, rewriting SQL, hacking system catalogs, …) > Basic idea:

• Rewrite queries at pgpool-II and run Presto queries using PL/Python

select count(*)from access

select * frompython_func(‘select count(*) from access’)

rewrite!

Page 38: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

3. Prestogres implementation

Page 39: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

psql

pgpool-IIodbc

jdbc

PostgreSQL Presto

Authentication1.

Rewriting queries Executing queries using PL/Python

2. 3.

Overview

Patched!

Page 40: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

psql

pgpool-IIodbc

jdbc

PostgreSQL Presto

Authentication1.

Rewriting queries Executing queries using PL/Python

2. 3.

Overview

Patched!

Prestogres

Page 41: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

StartupPacket { database = “mydb”, user = “me”, … }

Connection

Page 42: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

prestogres_hba.conf

host mydb me 0.0.0.0/0 trust presto_server presto.local:8080, presto_catalog hive, pg_database hive

StartupPacket { database = “mydb”, user = “me”, … }

Connection

Page 43: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

StartupPacket { database = “mydb”, user = “me”, … }

$ psql -U me mydb

> CREATE DATABASE hive; > CREATE ROLE me; > CREATE FUNCTION setup_system_catalog; > CREATE FUNCTION start_presto_query;

libpq host=‘localhost’, dbname=‘postgres’, user=‘prestogres’

prestogres_hba.conf

host mydb me 0.0.0.0/0 trust presto_server presto.local:8080, presto_catalog hive, pg_database hive

Connection

Page 44: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

prestogres_hba.conf

host mydb me 0.0.0.0/0 trust presto_server presto.local:8080, presto_catalog hive, pg_database hive

StartupPacketStartupPacket { database = “mydb”, user = “me”, … }

Connection

uses the database and user which were created right now!

{ database = “hive”, user = “me”, … }

Page 45: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

system catalog!

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

“Q” SELECT * FROM pg_class;

"Query against a system catalog!”

Meta-query

Page 46: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Presto

SELECT * FROMinformation_schema.columns

system catalog!

pgpool-IIpsql PostgreSQL

$ psql -U me mydb

SELECT setup_system_catalog(‘presto.local:8080’, ‘hive’)“Q” SELECT * FROM pg_class;

"Query against a system catalog!”

Meta-query

PL/Python functiondefined at prestogres.py

Page 47: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

SELECT setup_system_catalog(‘presto.local:8080’, ‘hive’)“Q” SELECT * FROM pg_class;

> CREATE TABLE access_logs; > CREATE TABLE users; > CREATE TABLE events; …

Meta-query

SELECT * FROMinformation_schema.columns

"Query against a system catalog!”

Page 48: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

“Q” SELECT * FROM pg_class; “Q” SELECT * FROM pg_class;

Meta-query"Query against a system catalog!”

reads the records whichwere generated right now!

Page 49: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Prestopgpool-IIpsql PostgreSQL

$ psql -U me mydb

“Q” select count(*) from access_logs;

regular table!

Presto Query"Query against a regular table!”

Page 50: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Presto

1. start the query on Presto

pgpool-IIpsql PostgreSQL

$ psql -U me mydb

“Q” select count(*) from access_logs; SELECT start_presto_query(… ‘select count(*) from access_logs’)

regular table!

Presto Query"Query against a regular table!”

PL/Python functiondefined at prestogres.py

libpq

Page 51: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

“Q” select count(*) from access_logs;

regular table!

Presto Query"Query against a regular table!”

1. start the query on Presto

2. define fetch_results()

CREATE TYPE result_type (c0_ BIGINT);

CREATE FUNCTION fetch_resultsRETURNS SETOF result_type AS $$ return prestogres.fetch_presto_query_results()$$ language plpythonu;

libpq

Page 52: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

pgpool-IIpsql PostgreSQL Presto

$ psql -U me mydb

“Q” select count(*) from access_logs; “Q” SELECT * FROM fetch_results();

Presto Query"Query against a regular table!”

defined by start_presto_querythat returns SETOF result_type

type of query results definedby start_presto_query

3. calls fetch_results()

Page 53: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

> select * from pg_class > In another connection, pgpool-II runs

setup_system_catalog() > Then forwards query: select * from pg_class

> select count(*) from access > In another connection, pgpool-II runs

start_presto_query(‘select count(*) from access’, …) > Then forwards query: select * from fetch_query_result()

> BEGIN > Forwards query: BEGIN

(no rewrite)

Examples

Page 54: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Demo

Page 55: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

4. Prestogres hacks

Page 56: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Multi-statement queries

> BEGIN; select count(*) from access; COMMIT > Parse query in pgpool-II > In anothe connection, call start_presto_query(‘select …’) > Rewrite query partially:

BEGIN; select * from fetch_query_result(); COMMIT > select count(*) from access; select count(*) from access

> not supported :(

Page 57: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Supporting Cursors

> DECLARE CURSOR xyz FOR select …; FETCH > Parse query in pgpool-II > In anothe connection, call start_presto_query(‘select …’) > Rewrite query partially:

DECLARE CURSOR xyz FOR select * from fetch_query_result(); FETCH

Page 58: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Error handling

> select xyz(*) from abc > do $$

RAISE EXCEPTION ‘%’, ’Function xyz is not defined’ USING errcode='42601'$$end language plpgsql

Page 59: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Faked current_database()

DELETE FROM pg_catalog.pg_proc WHERE proname=‘current_database’;

CREATE FUNCTION pg_catalog.current_database() RETURNS name AS $$begin return ‘faked_name’::name;end$$ language plpgsql stable strict;

Page 60: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

5. Future works

Page 61: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Future works

Rewriting CAST syntax

Extended query

CREATE TEMP TABLE

Page 62: Hacking PostgreSQL Internals - Citus Datainfo.citusdata.com/rs/...Hacking_PostgreSQL_Internals_to_Solve_Data... · Hacking PostgreSQL Internals ... without moving big data. scan scan

Thank you!

https://github.com/treasure-data/prestogreslicensed under Apache License.

Treasure Data, Inc. Founder & Software Architect

Sadayuki Furuhashi