postgresql as gpu database for real-time analytics€¦ · postgresql as gpu database for real-time...

Post on 25-Jun-2020

23 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Prof. Stefan Keller, IFS / Geometa Lab HSR

(Slides © CC-BY)

PostgreSQL as GPU Database

for Real-Time Analytics

Vortrag, Swiss PUG, Zürich, 9. November 2017

About Scalability

Scale-up

Vertical

Add more HW-components (homo- or heterogeneous)

Expensive(?)

No open source, platform lock-in(?)

Scale-out

Horizontal

Cheap commodity HW as „nodes‟

Flexibly add more nodes

Open source

Need to relax constraints, even ACID (BASE)?

2 Stefan Keller, "PostgreSQL as GPU Database..."

GPU Databases

Stefan Keller, "PostgreSQL as GPU Database..."

GPU Databases

More and more GPUs and Memory Bandwith…

Use Cases:

Analytical - not transactional

OLTP + OLAP = Hybrid transactional/analytical processing

(HTAP) => No need to move data to warehouse

Setting:

Single-node => much simpler to maintain

Discrete GPU (rather than FPGA, speciality chips)

CPU vs. GPU:

CPU is suited for low latency, complex data + ops

GPU is suited for troughput of homogeneous ops

4

GPU Database Reference

Architecture

Master of Science in Engineering (MSE)

Stefan Keller, "PostgreSQL as GPU Database..."

Paper

Paper by Heime, Siegmund, Bellatreche, Saake (Universities of Magdeburg, Berlin, Passau, Futuroscope/France) on

GPU-accelerated database systems: Survey and open challenges in Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer Berlin Heidelberg, 2014. Pages 1-35. Weblink: http://bit.ly/1rMOuZC (pdf)

Contents:

Design Choices

Evaluation of 8 GDBMS

Reference architecture

Insights for all co-processors

6

Overview

Exemplary architecture of a system with a graphics card:

7 Stefan Keller, "PostgreSQL as GPU Database..."

Architecture of GPU-aware DBMSs

Design choices/space of GPU-aware DBMSs

8 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom / PostgreSQL

Stefan Keller, "PostgreSQL as GPU Database..."

PostgreSQL - www.postgresql.org

“The world's most advanced open source database”.

Open source aka BSD/MIT license

PostgreSQL 10 Released October 2017 (since 2002)

Fully ACID compliant object-relational database system

Reputation for reliability, data integrity, and correctness

Broad community

Runs on all major operating systems

Broad support of SQL and data types

Scalable in quantity of data and concurrent users

Extensible: Modules (EXTENSION, Network), Foreign

Data Wrappers (SQL/MED), Language APIs

10 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom

PG-Strom - http://strom.kaigai.gr.jp/ - Version 1.0

“Limit breaker of PostgreSQL”

Extension module to accelerate SQL workloads using multi-thousands cores and high bandwidth memory. Open source GPLv2.

Requirements PostgreSQL 9.5

CUDA

Main use cases

In-database analytics: realt-time statistics

Rapid batch processing: ETL/ELT

Main SW architecture design decisions:

Heterogeneous scale-up

On-the-fly native GPU code generation

Asynchronous pipeline execution mode

11 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: SW architecture

12 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: Overview

13 Stefan Keller, "PostgreSQL as GPU Database..."

Source: http://strom.kaigai.gr.jp/

PG-Strom: Overview ff.

14 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: Features - Data types

Data Types:

Numeric: …; Date/Time: …; Others: bool, money

Text: …

Limits on text and varchar(x)

=> "GPU cannot process compressed or TOAST'ed data"

=> "ALTER TABLE ... SET STORAGE PLAIN" or MAIN

Not supported:

geometry, geography (PostGIS)

See Reference:Data Types for details

Internals:

Custom Scan Provider, see

www.postgresql.org/docs/current/static/custom-scan.html

15 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: Features – SQL workloads

Full Table Scan

with scan qualifiers, GPU runs evaluation of scan qualifier and filter out invisible rows…

Tables Join

Parallel version of hash-join algorithm and simple (none parameterized) nest-loop algorithm are supported…

Group By/Aggregation

GPU runs pre-processing of aggregate operations, to reduce the number of rows to be processed by CPU….

Projection

When SQL query contains complicated mathematical formulas, GPU runs calculation of these expression on the device, then CPU just references the calculated results

16 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: Limits

Latency

0.2-0.3 sec to initialize GPU device

Max. concurrent sessions

up to 3-5

Database size:

10 GB = data in shared buffer of PostgreSQL, or disk cache

of operating system

Tipp: Use pg_prewarm

See http://strom.kaigai.gr.jp/install.html

17 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: Performance

Estimations:

RDBMS + GPU => factor 3

Columnar In-Memory => factor 10

Pure GPU => factor 100

Benchmarks

See next slides

See Seminar 22. January 2018, 14-16h, HSR Rapperswil

18 Stefan Keller, "PostgreSQL as GPU Database..."

19 Stefan Keller, "PostgreSQL as GPU Database..."

PG-Strom: Further development

Version 1.x

More concurrent sessions

Data size: SSD collaboration feature at v2.0

PostGIS?

Where is it compared to the Rerefence Architecture?

20 Stefan Keller, "PostgreSQL as GPU Database..."

GPU Databases - öffentliche

Präsentationen im Seminar

Database Systems der HSR

Master of Science in Engineering (MSE)

Stefan Keller, "PostgreSQL as GPU Database..."

Seminar

SW:

PG-Storm 1.0 / PostgreSQL 9.5

MapD Open Source Edition

PostgreSQL 10, Tuned

HW:

Commodity Server („Pizzabox“)

IBM Power8 Server („Pizzabox“)

Data, Benchmarks, Docker-Files

See https://wiki.hsr.ch/Datenbanken/wiki.cgi?SeminarDatenbanksystemeHS1718

22

Seminar

Benchmarks:

Cold start PG-Storm, MapD, PostgreSQL (= 3x)

Warm start PG-Storm, MapD, PostgreSQL (= 3x)

Presentations:

4 students

German spoken, english report

Final (public) presentations:

22. January 2018, 14-16h

HSR Rapperswil, Room 8.125

Registration: http://techup.ch/tag/htap

23 Stefan Keller, "PostgreSQL as GPU Database..."

Discussion

Credits

Kohei KaiGai

Stefan Keller

Geometa Lab at Institute for Software

HSR Hochschule für Technik Rapperswil

www.hsr.ch/geometalab

@sfkeller

24 Stefan Keller, "PostgreSQL as GPU Database..."

top related