sf postgresql user group cstore presentation
DESCRIPTION
The slides used at the San Francisco PostgreSQL User Group meetup (http://www.meetup.com/postgresql-1/events/178687982/). Learn about how we at Citus Data implemented a columnar store for PostgreSQL using foreign data wrappers. Features discussions on architecture and benchmark results.TRANSCRIPT
![Page 1: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/1.jpg)
cstore_fdw – Columnar store for analytic
workloads
Hadi Moshayedi &Ozgun Erdogan
![Page 2: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/2.jpg)
What is CitusDB?
• CitusDB is a scalable analytics database that extends PostgreSQL– Citus shards your data and automatically parallelizes your
queries– Citus isn’t a fork of Postgres. Rather, it hooks onto the
planner and executor for distributed query execution.– Always rebased to newest Postgres version– Natively supports new data types and extensions
![Page 3: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/3.jpg)
A C
D G
worker node #1(extended PostgreSQL)
worker node #2(extended PostgreSQL)
A
worker node #3(extended PostgreSQL)
. . . .
1 shard =1 Postgres table
master node(extended PostgreSQL)
shard and shardplacement metadata
![Page 4: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/4.jpg)
Talk Overview
1. Why customers want columnar stores2. Live demo3. Optimized Row Columnar (ORC) format4. PostgreSQL benefits5. New benchmark numbers
![Page 5: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/5.jpg)
Id Sz Ln Ht … … … … … … … … … … …
1 4 3 4 … … … … … … … … … … …
2 4 11 3 … … … … … … … … … … …
3 1 4 2 … … … … … … … … … … …
4 8 4 12 … … … … … … … … … … …
…
4…
… … … … … … … … … … … … … …
4…
… … … … … … … … … … … … … …
4…
… … … … … … … … … … … … … …
30Mrows
700 columns
![Page 6: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/6.jpg)
Example SQL querySELECT
id, AVG(price), MAX(price)FROM
itemsWHERE
quantity > 100 AND last_stock_date < ‘2013-
10-01’GROUP BY
weight;
![Page 7: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/7.jpg)
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
![Page 8: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/8.jpg)
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
![Page 9: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/9.jpg)
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
![Page 10: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/10.jpg)
Id … price … … quant … … last_stm … … … … … weight
1 … 3.90 … … 31 … … 2013-… … … … … … 0.6
2 … 13 … … 70 … … 2010-… … … … … … 0.8
3 … 4.25 … … 432 … … 2013-… … … … … … 1
4 … 4 … … 45 … … 2013-… … … … … … 6
…
4… … 95 … … 37 … … 2013-… … … … … … 0.6
4… … 59 … … 90 … … 2012-… … … … … … 1.5
Row-oriented store
![Page 11: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/11.jpg)
Cost of row storage
• Read 700 columns instead of 5• >39 GB of unnecessary I/O
Input Type
Estimated Input Rate
Cost to query performance
Memory 10 GB/s 3.9 seconds
SSD 600 MB/s >60 seconds
![Page 12: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/12.jpg)
Example SQL querySELECT
id, AVG(price), MAX(price)FROM
itemsWHERE
quantity > 100 AND last_stock_date < ‘2013-
10-01’GROUP BY
weight;
![Page 13: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/13.jpg)
Id sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
Column-oriented store
![Page 14: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/14.jpg)
Column-oriented storeId sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
![Page 15: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/15.jpg)
Column-oriented storeId sz price … … quant … … last_stm … … … … … weight
1 4 3.90 … … 31 … … 2013-… … … … … … 0.6
2 3 13 … … 70 … … 2010-… … … … … … 0.8
3 2 4.25 … … 432 … … 2013-… … … … … … 1
4 4 4 … … 45 … … 2013-… … … … … … 6
…
4… 19 95 … … 37 … … 2013-… … … … … … 0.6
4… 2 59 … … 90 … … 2012-… … … … … … 1.5
![Page 16: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/16.jpg)
Columnar Store Motivation
• Read subset of columns to reduce I/O• Better compression
– Less disk usage– Less disk I/O
![Page 17: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/17.jpg)
State of the Columnar Store
1. Fork a popular database, swap in your storage engine, and never look back
2. Develop an open columnar store format for the Hadoop Distributed Filesystem (HDFS)
3. Use PostgreSQL extension machinery for in-memory stores / external databases
![Page 18: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/18.jpg)
Columnar Store Specs
• Record Columnar File (RCFile)– Facebook, OSU, and Chinese Academy of Sciences– First horizontally-partition, then vertically-partition
• ORC (Optimized RCFile)– Second generation. Developed by Hortonworks and Facebook– Lightweight indexes stored within the file– Different compression methods within the same file
![Page 19: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/19.jpg)
ORC File Layout benefits
1. Columnar layout – reads columns only related to the query
2. Compression – groups column values (10K) together and compresses them
3. Skip indexes – applies predicate filtering to skip over unrelated values
![Page 20: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/20.jpg)
Block 1Block 2Block 3Block 4Block 5Block 6Block 7
150K rows(configurable)
150K rows(configurable) 10K column values
(configurable) per block
![Page 21: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/21.jpg)
Compression
• Current compression method is PG_LZ from PostgreSQL core
• Easy to add new compression methods depending on the CPU / disk trade-off
• cstore_fdw enables using different compression methods at the column block level
![Page 22: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/22.jpg)
Table sizes normalized to 1.0
![Page 23: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/23.jpg)
Skip Indexes
• For each column block (10K), cstore_fdw also records min/max values in a skip index.
• When the user runs a query, we extract all filter clauses from the query.
• For example, the query specifies quantity > 100 And last_stock_date < ‘2013-10-01’.
![Page 24: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/24.jpg)
Skip Indexes
• We then use Postgres’ constraint exclusion mechanism to decide whether to skip over 10K rows.
• For each filter clause, we create and apply a constraint. The awesome thing about using PostgreSQL is that we don’t need to write any code.
• If input data has an inherent time dimension, that helps. Sorting input data also helps with skip indexes.
![Page 25: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/25.jpg)
Drawbacks to ORC
• Support for only eight data types. Each data type further needs to have a separate code path for min/max value collection and constraint exclusion.
• Gathering statistics from the data and table JOINs are an afterthought.
![Page 26: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/26.jpg)
1. Simply use PostgreSQL data types’ datum representation.
2. Avoid deserialization overhead.
3. Support user-defined types as well.
![Page 27: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/27.jpg)
Statistics Collection
• FDWs provide an API to collect random samples from data. Users need to manually run Analyze.
• Postgres then constructs histograms, most common value frequencies, and other stats.
• cstore_fdw estimates query costs for different access paths based on these statistics. *
• Informed resource usage. Better join order and join method selection.
![Page 28: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/28.jpg)
Recent Benchmark Results
• TPC-H is a standard benchmark• Performed in-memory, SSD, and HDD
tests on 10 GB of data• Used m2.2xlarge and m3.2xlarge on
EC2• Compared vanilla PostgreSQL, CStore,
CStore with compression
![Page 29: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/29.jpg)
10GB of uncached data on m2.2xlarge
![Page 30: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/30.jpg)
10GB of uncached data on m3.2xlarge
![Page 31: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/31.jpg)
Total issued disk I/O measures with iotop
![Page 32: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/32.jpg)
10GB of cached data on m2/m3.2xlarge
![Page 33: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/33.jpg)
Future Work
• CStore is an open source project actively in development: github.com/citusdata/cstore_fdw– Improve memory usage– Automatically determining paths for data files– Native Delete / Insert / Update support– Improve read query performance (vectorized execution)– Different compression codecs– Many more; contribute to the discussion on GitHub!
![Page 34: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/34.jpg)
Summary• CStore: Open source columnar store fdw for
Postgres• Data layout is based on ORC
1 Columnar data layout per stripe2 Supports different compression codecs3 Skip indexes enable predicate filtering
• Uses foreign wrapper APIs1 Supports all PostgreSQL data types2 Statistics collection for better query plans3 Load extension. Create Table. Copy
![Page 35: SF PostgreSQL User Group cstore presentation](https://reader035.vdocuments.us/reader035/viewer/2022062616/5499348bb47959f4708b46c6/html5/thumbnails/35.jpg)
cstore_fdw – Columnar Store for Analytic
Workloads
Hadi Moshayedi – [email protected] Erdogan – [email protected]