6.814/6.830 lecture 8

6.814/6.830 Lecture 8

Column StoresMemory Management

Recap : Column Representation Reduces Scan Time

• Idea: Store each column as a collection of sorted, compressed files– Only scan the columns that are needed

• Simple per-column compression (RLE/Delta/Dictionary/LZ)– Direct operation on compressed data

• Buffer writes in write-optimized store, periodically compress, sort, and write out as new column fragments

3

Write Performance

Tuple MoverAsynchronous Data

Movement

Queries read from both WOS and ROS

Batched

Amortizes seeks

Amortizes recompression

Enables continuous load

Trickle load: Very Fast Inserts

When to Rewrite ROS Objects?

• Store multiple ROS objects, instead of just one• Each of which must be scanned to answer a

query

• Tuple mover writes new objects• Avoids rewriting whole ROS on merge

• Periodically merge ROS objects to limit number of distinct objects that must be scanned (like Big Table)

Tuple Mover

WOS ROS

Older objects

5

When Are Columns Right?

• Warehousing (OLAP)• Read-mostly; batch update• Queries: Scan and aggregate a few

columns

• Vs. Transaction Processing (OLTP)• Write-intensive, mostly single record

ops.

• Column-stores: OLAP optimized

• In practice >10x performance on comparable HW, for many real world analytic applications• True even if w/ Flash or main memory!

Different architectures for different workloads

C-Store Performance

• How much do these optimizations matter?

• Wanted to compare against best you could do with a commercial system

7

Emulating a Column Store

• Two approaches:

1. Vertical partitioning: for n column table, store n two-column tables, with ith table containing a tuple-id, and attribute i

• Sort on tuple-id

• Merge joins for query results

2. Index-only plans

• Create a secondary index on each column

• Never follow pointers to base table

Two Emulation Approaches

9

Bottom Line

C-Store, Compression

C-Store, No Compression

C-Store, Early Materialize

Rows

Rows, Vert. Part.

Rows, All Indexes

4

15

41

26

80

221

Time (s)

SSBM (Star Schema Benchmark -- O’Neil et al ICDE 08) Data warehousing benchmark based on TPC-H Scale 100 (60 M row table), 17 columns Average across 12 queries Row store is a commercial DB, tuned by professional DBA vs

C-Store

Commercial System Does Not Benefit From Vertical Partitioning

10

Problems with Vertical Partitioning

① Tuple headers Total table is 4GB Each column table is ~1.0 GB Factor of 4 overhead from tuple headers and tuple-ids

② Merge joins Answering queries requires joins Row-store doesn’t know that column-tables are

sorted Sort hurts performance

Would need to fix these, plus add direct operation on compressed data, to approach C-Store performance

Problems with Index-Only Plans

Consider the query:

SELECT store_name, SUM(revenue) FROM Facts, Stores WHERE fact.store_id = stores.store_id AND stores.country = “Canada” GROUP BY store_name

• Two WHERE clauses result in a list of tuple IDs that pass all predicates

• Need to go pick up values from store_name and revenue columns

• But indexes map from valuetuple ID!

• Column stores can efficiently go from tuple IDvalue in each column

12

Recommendations for Row-Store Designers

• Might be possible to get C-Store like performance

① Need to store tuple headers elsewhere (not require that they be read from disk w/ tuples)

② Need to provide efficient merge join implementation that understands sorted columns

③ Need to support direct operation on compressed data• Requires “late materialization” design

13

Summary

• C-Store is a “next gen” column-oriented databases

• Key New Ideas:• Late materialization• Compression & direct operation• Fast load via “write optimized store”

• Row-stores do a poor job of emulation• Need better support for compression, late

materialization• Need support for narrow tuples, efficient merge

joins

C-Store: http://db.csail.mit.edu/cstore13

http://db.csail.mit.edu/cstore



Study Break

pgadmin3 demo

14

Database Internals OutlineFront End

Admission ControlConnection Management

(sql)Parser

(parse tree)Rewriter

(parse tree) Planner & Optimizer

(query plan) Executor

Query System

Storage System

Access MethodsLock ManagerBuffer ManagerLog Manager

This time

Last time

Buffer Manager

• Cache of recently accessed pages

• Mediates all access to page data / index methods

• Today: DBMIN Algorithm for Page Management

16

LRU w/ 3 pages

A B C Read Hit/miss?

1 1 m

17

LRU w/ 3 pages


1 1 m

1 2 2 m

18

LRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

19

LRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 4 2 3 4 m

20

LRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 4 2 3 4 m

1 4 2 1 3 1 m

21

MRU w/ 3 pages


1 1 m

22

MRU w/ 3 pages


1 1 m

1 2 2 m

23

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

24

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

25

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

26

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

27

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

28

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

1 1 3 4 3 m

29

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

1 1 3 4 3 m

1 3 4 4 h

30

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

1 1 3 4 3 m

1 3 4 4 h

1 3 4 1 h

31

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

1 1 3 4 3 m

1 3 4 4 h

1 3 4 1 h

1 2 3 4 2 m

32

MRU w/ 3 pages


1 1 m

1 2 2 m

1 2 3 3 m

1 2 1 4 4 m

1 2 4 1 h

1 2 4 2 h

1 1 3 4 3 m

1 3 4 4 h

1 3 4 1 h

1 2 3 4 2 m

2 3 4 3 h

33

6.814/6.830 lecture 8

Documents