don’t hold my data hostage - hannes mühleisen · state of the art protocols significant per-row...
Post on 22-May-2021
1 Views
Preview:
TRANSCRIPT
Don’t Hold My Data Hostage
Mark Raasveldt, Hannes Mühleisen
A Case For Client Protocol Redesign
1
What is a Client Protocol anyway?
▸ Protocol is how a client communicates with a server
▸ ODBC, JDBC, psql
▸ Every database that supports remote clients has a client protocol
▸ Using this protocol, clients can:
▸ Connect to the database
▸ Query it
▸ Receive the query results
2
3What is a Client Protocol anyway?
Motivation
▸ Problem: Current protocols were designed for exporting small amount of rows
▸ Displaying results on screen
▸ OLTP use cases
▸ Exporting aggregates
▸ Exporting large amounts of data using these protocols is slow
▸ Analytical tools (e.g. R/Python)
▸ Data export is a bottleneck when result sets are large!
4
Motivation
▸ Cost of exporting the SF10 lineitem table from TPC-H (7.2GB in CSV format) on localhost
Netcat (10.25s)
170.9170.9170.9
189.6189.6189.6
629.9629.9629.9
221.2221.2221.2
686.5686.5686.5
101.3101.3101.3
391.3391.3391.3
202202202
MongoDB
Hive
MySQL+C
MonetDB
PostgreSQL
DBMS X
DB2
MySQL
0 200 400 600Wall clock time (s)
OperationConnectionQuery ExecutionRSS + Transfer
5
SELECT * FROM LINEITEM;
Motivation
▸ We are not the first ones to notice this problem
▸ A lot of work on in-database processing, UDFs, etc.
▸ However, that work is database-specific, requires adapting of existing work flows and introduces safety issues
▸ Why is exporting large amounts of data from a database so inefficient?
▸ Can we make it more efficient?
6
State of the Art Protocols
▸ We reverse engineered how different databases transfer the following table “on the wire”
▸ Source code/documentation
▸ Decompilation of JDBC Drivers
▸ Wireshark
increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.
Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.
Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.
In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.
2.3 Result Set SerializationIn order to better understand the di↵erences in time and
transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.
Table 2: Simple result set table.
INT32 VARCHAR10100,000,000 OK
NULL DPFKG
For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Count
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Type
Figure 5: PostgreSQL result set wire format
PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the
amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
Figure 6: MySQL text result set wire format
MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.
07
Length
Field
1
00
Length
Field
2
4450464B47
02
Data
Field
2
Data
Field
1
0702 4F4B
05
C502
Packet
Header
Figure 7: DBMS X result set wire format
DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.
7
State of the Art Protocols
▸ Significant per-row overhead
▸ Used by many other systems:
▸ Redshift, HyPer, Greenplum and Vertica
PostgreSQL
increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.
Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.
Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.
In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.
2.3 Result Set SerializationIn order to better understand the di↵erences in time and
transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.
Table 2: Simple result set table.
INT32 VARCHAR10100,000,000 OK
NULL DPFKG
For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Count
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Type
Figure 5: PostgreSQL result set wire format
PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the
amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
Figure 6: MySQL text result set wire format
MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.
07
Length
Field
1
00
Length
Field
2
4450464B47
02
Data
Field
2
Data
Field
1
0702 4F4B
05
C502
Packet
Header
Figure 7: DBMS X result set wire format
DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Cou
nt
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Typ
e
8
State of the Art Protocols
▸ ASCII protocol
▸ Every value has a length field
▸ Supports compression with GZIP
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
MySQL
increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.
Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.
Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.
In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.
2.3 Result Set SerializationIn order to better understand the di↵erences in time and
transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.
Table 2: Simple result set table.
INT32 VARCHAR10100,000,000 OK
NULL DPFKG
For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Count
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Type
Figure 5: PostgreSQL result set wire format
PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the
amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
Figure 6: MySQL text result set wire format
MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.
07
Length
Field
1
00
Length
Field
2
4450464B47
02
Data
Field
2
Data
Field
1
0702 4F4B
05
C502
Packet
Header
Figure 7: DBMS X result set wire format
DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.
9
State of the Art Protocols
▸ Columnar protocol
▸ Uses generic Thrift serialisation library
▸ One byte per value in NULL mask
▸ Also used by SparkSQL
Hive
1A
Field
Begin
00000000000000003F
Field
Begin
1F
Field
Begin
28
ListBegin
00
Field
Stop
5FAF8480 1F
Field
Begin
2B
ListBegin
054450464B47024F4B 00
Field
Stop
00
Field
Stop
0000000200
Field
Stop
58
Field
Begin
Start
Row
Offset
Column
Cou
nt
Data
Column1
Data
Column2
18Field
Begin
0001
NULL
Mask
Field
Length0200
Field
Stop
00
increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.
Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.
Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.
In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.
2.3 Result Set SerializationIn order to better understand the di↵erences in time and
transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.
Table 2: Simple result set table.
INT32 VARCHAR10100,000,000 OK
NULL DPFKG
For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Count
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Type
Figure 5: PostgreSQL result set wire format
PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the
amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
Figure 6: MySQL text result set wire format
MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.
07
Length
Field
1
00
Length
Field
2
4450464B47
02
Data
Field
2
Data
Field
1
0702 4F4B
05
C502
Packet
Header
Figure 7: DBMS X result set wire format
DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.
10
Protocol Design Space
▸ Implemented prototype in PostgreSQL and MonetDB
▸ Serialisation Format (ASCII, Custom Binary, Generic)
▸ Custom Binary
▸ Row Major or Column Major
▸ Column-Major (but chunked)
▸ Data Compression Methods
▸ No compression local, stream compression on remote
▸ Null Handling
▸ Close to native formats
11
Proposed Protocol in MonetDB
02000000 BC100000 00E1F505
Data
Column2
4450464B47004F4B00
Data
Column1
Row
Cou
nt
09000000
▸ Columnar, 1MB chunks prefixed with row count
▸ Missing values stored as special values in domain
▸ Variable-length columns prefixed with their length
12
increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.
Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.
Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.
In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.
2.3 Result Set SerializationIn order to better understand the di↵erences in time and
transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.
Table 2: Simple result set table.
INT32 VARCHAR10100,000,000 OK
NULL DPFKG
For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Count
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Type
Figure 5: PostgreSQL result set wire format
PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the
amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
Figure 6: MySQL text result set wire format
MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.
07
Length
Field
1
00
Length
Field
2
4450464B47
02
Data
Field
2
Data
Field
1
0702 4F4B
05
C502
Packet
Header
Figure 7: DBMS X result set wire format
DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.
Proposed Protocol in PostgreSQL
Column
Data
02000000 00E1F505
4450464B47004F4B00
Row
Cou
nt04000000
09000000
02
00
NULL
Mask
Column
Length
▸ Missing values (PostgreSQL)
▸ NULL bitmask for each column, 1 bit per value
▸ Only add mask if column has missing values
▸ Columns with mask have a column-length as well
13
increased. It is possible that they send explicit confirmationmessages from the client to the server to indicate that theclient is ready to receive the next batch of data. Thesemessages are cheap with a low latency, but become verycostly when the latency increases.
Contrary to our prediction, we find that the performanceof all systems is heavily influenced by a high latency. This isbecause, while the server and client do not explicitly sendconfirmation messages to each other, the underlying TCP/IPlayer does send acknowledgement messages when data isreceived [27]. TCP packets are sent once the underlyingbu↵er fills up, resulting in an acknowledgement message.As a result, protocols that send more data trigger moreacknowledgements and su↵er more from a higher latency.
Throughput. Reducing the throughput of a connectionadds a variable cost to sending messages depending on thesize of the message. Restricted throughput means sendingmore bytes over the socket becomes more expensive. Themore we restrict the throughput, the more protocols thatsend a lot of data are penalized.
In Figure 4, we can see the influence that lower throughputshave on the di↵erent protocols. When the bandwidth isreduced, protocols that send a lot of data start performingworse than protocols that send a lower amount of data.While the PostgreSQL protocol performs well with a highthroughput, it starts performing significantly worse than theother protocols with a lower throughput. Meanwhile, we alsoobserve that when the throughput decreases compressionbecomes more e↵ective. When the throughput is low, theactual data transfer is the main bottleneck and the cost of(de)compressing the data becomes less significant.
2.3 Result Set SerializationIn order to better understand the di↵erences in time and
transferred bytes between the di↵erent protocols, we haveinvestigated their data serialization formats.
Table 2: Simple result set table.
INT32 VARCHAR10100,000,000 OK
NULL DPFKG
For each of the protocols, we show a hexadecimal repre-sentation of Table 2 encoded with each result set format.The bytes used for the actual data are colored green, whileany overhead is colored white. For clarity, leading zeroes arecolored gray.
4450464B4705000000FFFFFFFF
44 020010000000 4F4B0200000000E1F50500000004
44 02000F000000
Total
Length
Field
Count
Length
Field
1
Data
Field
1
Data
Field
2
Length
Field
2
Message
Type
Figure 5: PostgreSQL result set wire format
PostgreSQL. Figure 5 shows the result set serializationof the widely used PostgreSQL protocol. In the PostgreSQLresult set, every single row is transferred in a separate pro-tocol message [35]. Each row includes a total length, the
amount of fields, and for each field its length (�1 if the valueis NULL) followed by the data. We can see that for this resultset, the amount of per-row metadata is greater than theactual data w.r.t. the amount of bytes. Furthermore, a lotof information is repetitive and redundant. For example,the amount of fields is expected to be constant for an entireresult set. Also, from the result set header that precedesthose messages, the amount of rows in the result set is known,which makes the message type marker redundant. This largeamount of redundant information explains why PostgreSQL’sclient protocol requires so many bytes to transfer the resultset in the experiment shown in Table 1. On the other hand,the simplicity of the protocol results in low serialization anddeserialization costs. This is reflected in its quick transfertime if the network connection is not a bottleneck.
4450464B47
4F4B00
0500
04
05
Data
Field
2
Data
Length
Packet
Nr.
Data
Field
1
Length
Field
2
0D00
0700
313030303030303002
FB
09
Length
Field
2
Length
Field
1
Figure 6: MySQL text result set wire format
MySQL. Figure 6 shows MySQL/MariaDB’s protocolencoding of the sample result set. The protocol uses binaryencoding for metadata, and text for actual field data. Thenumber of fields in a row is constant and defined in the resultset header. Each row starts with a three-byte data length.Then, a packet sequence number (0-256, wrapping around)is sent. This is followed by length-prefixed field data. Fieldlengths are encoded as variable-length integers. NULL valuesare encoded with a special field length, 0xFB. Field datais transferred in ASCII format. The sequence number isredundant here as the underlying TCP/Unix Sockets alreadyguarantees that packets arrive in the same order in whichthey were sent.
07
Length
Field
1
00
Length
Field
2
4450464B47
02
Data
Field
2
Data
Field
1
0702 4F4B
05
C502
Packet
Header
Figure 7: DBMS X result set wire format
DBMS X has a very terse protocol. However, it is muchmore computationally heavy than the protocol used by Post-greSQL. Each row is prefixed by a packet header, followedby the values. Every value is prefixed by its length in bytes.This length, however, is transferred as a variable-length inte-ger. As a result, the length-field is only a single byte for smalllengths. For NULL values, the length field is 0 and no actualvalue is transferred. Numeric values are also encoded usinga custom format. On a lower layer, DBMS X uses a fixednetwork message length for batch transfers. This messagelength is configurable and according to the documentation,considerably influences performance. We have set it to thelargest allowed value, which gave the best performance inour experiments.
Benchmarks
▸ Three different network configurations
▸ Localhost: No network restrictions
▸ LAN: 1000 Mb/s throughput, 0.3ms latency
▸ WAN: 100 Mb/s throughput, 25ms latency
▸ Lineitem: SF10, 60 million rows, 16 columns, 7.2GB in CSV format
▸ 1 hour timeout
14
BenchmarksLocalhost (No network restrictions)
Netcat
Netcat+Sy
Netcat+GZ
MonetDB++
MonetDB++C
Postgres++
Postgres++C
MySQL
MySQL+C
PostgreSQL
DB2DBMS X
HiveMonetDB
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
4
6
8
10
10 100 1000Time (s)
Size
(GB)
15
BenchmarksLAN (1000 Mb/s throughput, 0.3ms latency)
Netcat
Netcat+Sy
Netcat+GZ
MonetDB++
MonetDB++C
Postgres++
Postgres++C
MySQL
MySQL+C
PostgreSQL
DB2
DBMS X
HiveMonetDB
●
●
●
●
●
●
●
●
●
●
●
●
●
●
2
4
6
8
10
10 100 1000Time (s)
Size
(GB)
16
BenchmarksWAN (100 Mb/s throughput, 25ms latency)
Netcat
Netcat+Sy
Netcat+GZ
MonetDB++
MonetDB++C
Postgres++
Postgres++C
MySQL
MySQL+C
PostgreSQL
MonetDB
●
●
●
●
●
●
●
●
●
●
●
2
4
6
8
10
10 100 1000Time (s)
Size
(GB)
17
Conclusions
▸ Exporting data from a database does not have to be so inefficient
▸ State of the art database protocols can be improved for this use case
▸ We show this by implementing prototypes in two databases (MonetDB and PostgreSQL)
▸ Avoid per-row overhead, bulk transfer
▸ Stay close to database native formats
▸ Avoid unnecessary copying and conversion
▸ Lightweight compression on remote
▸ MonetDB implementation is already released.
▸ Benchmark information: https://goo.gl/usjfyJ
18
top related