monetdb: a column-oriented dbms ryan johnson csc2531

12
MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Upload: lesley-watts

Post on 16-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

MonetDB: A column-oriented DBMSRyan JohnsonCSC2531

Page 2: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

The memory wall has arrived• CPU performance

+70%/year• Memory performance

latency: -50%/decadebandwidth: +20%/year (est.)

• Why?– DRAM focus on capacity (+70%/year)– Physical limitations (pin counts, etc.)– Assumption that caches "solve” latency problem

DBMS spends 95% of time waiting for memory

Page 3: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

The problem: data layouts• Logical layout: 2-D relation

=> Unrealizable in linear address space!• N-ary storage layout, aka “slotted pages”

– Easy row updates, strided access to columns=> Low cache locality for read-intensive workloads

“NSM layouts considered harmful”

. . .

Page 4: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Coping with The Wall• Innovation: decompose all data vertically

– Columns stored separately, rejoined at runtime• Binary Association Table (BAT) replaces Relation– List of (recordID, columnValue) pairs– Compression and other tricks ==> 1 byte/entry

BAT + clever algos => cache locality => Winner!

Page 5: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Exploring deeper• Performance study (motivation)• Physical data layouts• Cache-optimized algorithms• Evaluating MonetDB performance• Implications and lingering questions

Page 6: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

NSM: access latency over time

Read one column(record size varies with x)

Latency increases ~10x as accesses/cache line 1 (slope changes at L1/L2 line size)

Page 7: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Efficient physical BAT layout• Idea #1: “virtual OID”– Optimizes common case– Dense, monotonic OIDs– All BATs sorted by OID

• Idea #2: compression– Exploits small domains– Boosts cache locality,

effective mem BW

How to handle gaps?

Out-of-band values?

Can’t we compress NSM also?

Joining two BAT on OID has O(n) cost!

Page 8: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Cache-friendly hash join• Hash partitioning: one

pass but trashes L1/L2 – #clusters > #cache lines

• Radix-partitioning: limit active #partitions by making more passes

Recall:CPU is cheap compared to

memory access

Page 9: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Great, but how well does it work?• Three metrics of interest– L1/L2 misses (= suffer latency of memory access)– TLB misses (even more expensive than cache miss)– Query throughput (higher is better)

• Should be able to explain throughput using other metrics– Given model makes very good predictions=> Memory really is (and remains!) the bottleneck

Page 10: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

A few graphs

Big win: stability as cardinalities vary

Radix clustering behavior as cardinality varies

Radix-clustered HJ vs. other algorithms

Page 11: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Implications and discussion points• Cache-friendly really matters (even w/ I/O)– Traditional DBMS memory-bound

• Vertically decomposed data: superior density – Data brought to cache only if actually needed– Compression gives further density boost

• Questions to consider...– Queries accessing many columns?– What about inserts/updates (touch many BAT)? – What about deletes/inserts (bad for compression)?

Page 12: MonetDB: A column-oriented DBMS Ryan Johnson CSC2531

Implications and discussion points• Cache-friendly really matters (even w/ I/O)– Traditional DBMS memory-bound

• Vertically decomposed data: superior density – Data brought to cache only if actually needed– Compression gives further density boost

• Questions to consider...– Queries accessing many columns?– How to make a good query optimizer?– Performance of transactional workloads?• Update-intensive, concurrency control, ...

– What about inserts (bad for compression)?