don’t hold my data hostage - hannes mühleisen · state of the art protocols significant per-row...

Don’t Hold My Data Hostage

Mark Raasveldt, Hannes Mühleisen

A Case For Client Protocol Redesign

What is a Client Protocol anyway?

▸ Protocol is how a client communicates with a server

▸ ODBC, JDBC, psql

▸ Every database that supports remote clients has a client protocol

▸ Using this protocol, clients can:

▸ Connect to the database

▸ Query it

▸ Receive the query results

3What is a Client Protocol anyway?

Motivation

▸ Problem: Current protocols were designed for exporting small amount of rows

▸ Displaying results on screen

▸ OLTP use cases

▸ Exporting aggregates

▸ Exporting large amounts of data using these protocols is slow

▸ Analytical tools (e.g. R/Python)

▸ Data export is a bottleneck when result sets are large!

Motivation

▸ Cost of exporting the SF10 lineitem table from TPC-H (7.2GB in CSV format) on localhost

Netcat (10.25s)

170.9170.9170.9

189.6189.6189.6

629.9629.9629.9

221.2221.2221.2

686.5686.5686.5

101.3101.3101.3

391.3391.3391.3

202202202

MongoDB

MySQL+C

MonetDB

PostgreSQL

DBMS X

0 200 400 600Wall clock time (s)

OperationConnectionQuery ExecutionRSS + Transfer

SELECT * FROM LINEITEM;

Motivation

▸ We are not the first ones to notice this problem

▸ A lot of work on in-database processing, UDFs, etc.

▸ However, that work is database-specific, requires adapting of existing work flows and introduces safety issues

▸ Why is exporting large amounts of data from a database so inefficient?

▸ Can we make it more efficient?

State of the Art Protocols

▸ We reverse engineered how different databases transfer the following table “on the wire”

▸ Source code/documentation

▸ Decompilation of JDBC Drivers

▸ Wireshark

increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.

Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.

Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.

In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.

2.3 Result Set SerializationIn order to better understand the di↵erences in time and

transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.

Table 2: Simple result set table.

INT32 VARCHAR10100,000,000 OK

NULL DPFKG

For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

Figure 5: PostgreSQL result set wire format

PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the

amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

Figure 6: MySQL text result set wire format

MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.

Length

4450464B47

0702 4F4B

Packet

Header

Figure 7: DBMS X result set wire format

DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.

▸ Significant per-row overhead

▸ Used by many other systems:

▸ Redshift, HyPer, Greenplum and Vertica

PostgreSQL

NULL DPFKG

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

4450464B47

0702 4F4B

Packet

Header

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

▸ ASCII protocol

▸ Every value has a length field

▸ Supports compression with GZIP

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

NULL DPFKG

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

4450464B47

0702 4F4B

Packet

Header

▸ Columnar protocol

▸ Uses generic Thrift serialisation library

▸ One byte per value in NULL mask

▸ Also used by SparkSQL

00000000000000003F

ListBegin

5FAF8480 1F

ListBegin

054450464B47024F4B 00

0000000200

Offset

Column

Column1

Column2

18Field

Length0200

NULL DPFKG

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

4450464B47

0702 4F4B

Packet

Header

Protocol Design Space

▸ Implemented prototype in PostgreSQL and MonetDB

▸ Serialisation Format (ASCII, Custom Binary, Generic)

▸ Custom Binary

▸ Row Major or Column Major

▸ Column-Major (but chunked)

▸ Data Compression Methods

▸ No compression local, stream compression on remote

▸ Null Handling

▸ Close to native formats

Proposed Protocol in MonetDB

02000000 BC100000 00E1F505

Column2

4450464B47004F4B00

Column1

09000000

▸ Columnar, 1MB chunks prefixed with row count

▸ Missing values stored as special values in domain

▸ Variable-length columns prefixed with their length

NULL DPFKG

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

4450464B47

0702 4F4B

Packet

Header

Proposed Protocol in PostgreSQL

Column

02000000 00E1F505

4450464B47004F4B00

nt04000000

09000000

Column

Length

▸ Missing values (PostgreSQL)

▸ NULL bitmask for each column, 1 bit per value

▸ Only add mask if column has missing values

▸ Columns with mask have a column-length as well

NULL DPFKG

4450464B4705000000FFFFFFFF

44 020010000000 4F4B0200000000E1F50500000004

44 02000F000000

Length

Message

4450464B47

4F4B00

Length

Packet

Length

313030303030303002

Length

4450464B47

0702 4F4B

Packet

Header

Benchmarks

▸ Three different network configurations

▸ Localhost: No network restrictions

▸ LAN: 1000 Mb/s throughput, 0.3ms latency

▸ WAN: 100 Mb/s throughput, 25ms latency

▸ Lineitem: SF10, 60 million rows, 16 columns, 7.2GB in CSV format

▸ 1 hour timeout

BenchmarksLocalhost (No network restrictions)

Netcat

Netcat+Sy

Netcat+GZ

MonetDB++

MonetDB++C

Postgres++

Postgres++C

MySQL+C

PostgreSQL

DB2DBMS X

HiveMonetDB

10 100 1000Time (s)

BenchmarksLAN (1000 Mb/s throughput, 0.3ms latency)

Netcat

Netcat+Sy

Netcat+GZ

MonetDB++

MonetDB++C

Postgres++

Postgres++C

MySQL+C

PostgreSQL

DBMS X

HiveMonetDB

10 100 1000Time (s)

BenchmarksWAN (100 Mb/s throughput, 25ms latency)

Netcat

Netcat+Sy

Netcat+GZ

MonetDB++

MonetDB++C

Postgres++

Postgres++C

MySQL+C

PostgreSQL

MonetDB

10 100 1000Time (s)

Conclusions

▸ Exporting data from a database does not have to be so inefficient

▸ State of the art database protocols can be improved for this use case

▸ We show this by implementing prototypes in two databases (MonetDB and PostgreSQL)

▸ Avoid per-row overhead, bulk transfer

▸ Stay close to database native formats

▸ Avoid unnecessary copying and conversion

▸ Lightweight compression on remote

▸ MonetDB implementation is already released.

▸ Benchmark information: https://goo.gl/usjfyJ

don’t hold my data hostage - hannes mühleisen · state of the art protocols significant per-row...

Documents

pivotal greenplum database · greenplum database...

don't hold my data hostage – a case for client protocol...

hw09 hadoop + vertica

greenplum on pks

hp vertica essentials - packt publishing · pdf fileoracle...

wp greenplum

data mashups vertica

greenplum overview

greenplum database 4 - | greenplum admins

pivotal greenplum command...

greenplum architecture

ganga vertica

vertica installation tips

greenplum text analytics

greenplum mapreduce whitepaper

greenplum database 4 - gpadmin.me | greenplum...

greenplum database overview

welcome to pivotal greenplum database 4.3.23 · only...

pivotal greenplum database · contents release notes 4...

building multi-petabyte data warehouses with clickhouse ·...