class 6 more about column-store plans and...

Post on 15-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

more about column-store plansand compression

prof. Stratos IdreosHTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 6

CS165, Fall 2016 Stratos Idreos /30

query compilation

2

an “ancient” yet new topic/research challenge

query->SQL->interpet to a generic query plan->c functions

query->SQL->generate and link tailored c code

CS165, Fall 2016 Stratos Idreos /30

query compilation

2

an “ancient” yet new topic/research challenge

query->SQL->interpet to a generic query plan->c functions

query->SQL->generate and link tailored c code

why this can be better/faster

CS165, Fall 2016 Stratos Idreos /30

query compilation

2

an “ancient” yet new topic/research challenge

query->SQL->interpet to a generic query plan->c functions

query->SQL->generate and link tailored c code

why this can be better/faster

why it was not ideal in the past

CS165, Fall 2016 Stratos Idreos /303

Efficiently Compiling Efficient Query Plans for Modern HardwareThomas Neumann In Proc. of the Very Large Databases Conference (VLDB), 2011

H2O: A Hands-free Adaptive StoreIoannis Alagiannis, Stratos Idreos, and Anastassia Ailamaki In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2014

Generating code for holistic query evaluationKonstantinos Krikellas, Stratis Viglas, Marcelo Cintra: In Proc. of the Inter. Conference on Data Engineering (ICDE), 2010

Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, Alfons Kemper In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2016

CS165, Fall 2016 Stratos Idreos /30

research night coming up

4

during one of the labs next week (will announce in piazza)

manos & stratos

can we have bitmap indexing that works great for both reads and writes?

UpBit: Scalable In-Memory Updatable Bitmap IndexingM. Athanassoulis, Z. Yan, and S. Idreos In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2016

CS165, Fall 2016 Stratos Idreos /305

working over fixed width & dense columns

for (i=0;i<size;i++) if column[i]>v

res[j++]=i

no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values

for (i=0;i<size;i++) inter2[j++]=column[inter1[i]]

select

fetch

CS165, Fall 2016 Stratos Idreos /306

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDs

A B C D B<20 minCA<10 IDs B CIDs

column-

vector-

CS165, Fall 2016 Stratos Idreos /307

update row7=(A=a,B=b,C=c,D=d)

A B C D A B C D

vs

cost: 1 page cost: N pages, N=# of columns

CS165, Fall 2016 Stratos Idreos /30

A

8

A B C D

B C D

base data pending updates

updatequery

periodically

CS165, Fall 2016 Stratos Idreos /309

a1 a2 a3 a4 a5 a6

b1 b2 b3 b4 b5 b6

c1 c2 c3 c4 c5 c6

tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

A B C

relational table

inserts, deletes affect whole table

CS165, Fall 2016 Stratos Idreos /3010

A

pending inserts pending deletesupdate= delete followed by insert

what information do we need to remember

CS165, Fall 2016 Stratos Idreos /3011

registers

on chip cache

on board cache

memory

disk

CPU

data

compute

time

speed cpu

mem

compression=

data

&

computation

but

CS165, Fall 2016 Stratos Idreos /3012

value1 value2 value3 value1 value1 value4 value2 value3 value5

8 bytes width

value1 value2 value3 value4 value5

001 010 011 001 001 100 010 011 101 …

8 bytes width

3 bits width

how many bits do we need for the codes (width)

CS165, Fall 2016 Stratos Idreos /3013

A B C D A B C D

which one gives better compression

CS165, Fall 2016 Stratos Idreos /3014

value1 value2 value3 value4 value5

001 010 011 001 001 100 010 011 101 …

8 bytes width

3 bits width think about

billions of entries…

can we do any better

CS165, Fall 2016 Stratos Idreos /3015

A

can we do something like huffman coding any side-effects

A dictionary

(check: Business Analytics in (a) Blink from readings)

CS165, Fall 2016 Stratos Idreos /3016

value1 value2 value3 value4 value5

001 010 011 001 001 100 010 011 101 …

8 bytes width

3 bits width

and how do we process datae.g., select A<10

CS165, Fall 2016 Stratos Idreos /3017

fixed-width is key… ok and how do we store variable length data?

dictionary of strings

fixed width codes that point to dictionary entries why

CS165, Fall 2016 Stratos Idreos /3018

compression is a bonus task for the project

Joins on Encoded and Partitioned Data Jae-Gil Lee, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, Oliver Draese, Frederick Ho,

Stratos Idreos, Min-Soo Kim, Sam Lightstone, Guy M. Lohman, Konstantinos Morfonios, Keshava Murthy, Ippokratis Pandis, Lin Qiao, Vijayshankar Raman, Vincent Kulandai Samy,

Richard Sidle, Knut Stolze, Liping Zhang In Proc. of the Very Large Databases Conference (VLDB), 2014

IBM Blue

CS165, Fall 2016 Stratos Idreos /3019

B<20 minCA<10 IDs B CIDs

late tuple reconstruction/materialization only reconstruct to present results

no need to assemble tuples minimize memory footprint minimize data we are moving up the memory hierarchy but requires new processing engine

CS165, Fall 2016 Stratos Idreos /3020

Assume a column-store database with a table R(A,B,C,D,E). All attributes are integers. Our workload has two classes of queries:

1) select max(B),max(C),max(D),max(E) from R where A>v1 2) select B+C+D+E from R where A>v1

Should we use late or early tuple reconstruction plans? For each query, draw the 2 possible plans and the respective operators, explain which one is best and give the total cost.

early vs late tuple reconstruction

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3021

sel A IDs B max

sel A IDs B

result

max(B), max(C) max(D), max(E)

C D E

late TR

hybrid

IDs C max IDs D maxresult

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3022

sA IDs B

sA IDs B result+(B,C,D,E)C D E

late TR

r1 +(r1,r2) r3IDs C r2 IDs D r4 +(r3,r4) res

hybrid

CS165, Fall 2016 Stratos Idreos /3023

defaultlate tuple reconstruction (rather safe choice)

open topic for optimizationwhen and how to do selective early reconstruction

issues to considertransformation overhead materialization overhead extra passes over the data it may be almost for free (sometimes in hash joins) dynamic code generation to fit data layouts

DSM vs. NSM: CPU performance tradeoffs in block-oriented query processingMarcin Zukowski, Niels Nes, Peter A. Boncz International Workshop on Data Management on New Hardware (DaMoN) 2008

CS165, Fall 2016 Stratos Idreos /3024

essential column-stores featuresvirtual ids late tuple reconstruction (if ever) vectorized execution compression fixed-width columns

0"

5"

10"

15"

20"

25"

30"

35"

40"

45"

Column"Store" Row"Store"

Run$

me'(sec)'

Performance'of'Column3Oriented'Op$miza$ons'

–Late"Materializa:on"

–Compression"

–Join"Op:miza:on"

–Tuple@at@a@:me"

Baseline"

Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem

ACM SIGMOD Conference on Management of Data, 2008

CS165, Fall 2016 Stratos Idreos /3025

disk memoryA B C D

A

ABCrow-store

engineearly tuple

reconstruction/materialization

option1

option2

column-store

engine

CS165, Fall 2016 Stratos Idreos /3026

but why now…weren’t all those design options obvious in the past as well?

moving data from disk

moving data from memory

computation 1) big memories 2) cpu vs memory speed

CS165, Fall 2016 Stratos Idreos /30

main-memory systems

27

optimized for the memory wall

with or without persistent data

CS165, Fall 2016 Stratos Idreos /3028

other system categoriesnoSQL, newSQL, key-value stores, matlab, etc..

column-stores = bad name modern systems

CS165, Fall 2016 Stratos Idreos /3029

research papers

Integrating compression and execution in column-oriented database systemsDaniel J. Abadi, Samuel Madden, Miguel Ferreira In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2006

Updating a cracked databaseStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2007

Positional update handling in column storesSándor Héman, Marcin Zukowski, Niels J. Nes, Lefteris Sidirourgos, Peter A. Boncz In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2010

Column-stores vs. row-stores: how different are they really? D. Abadi, S. Madden, and N. Hachem In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2008

DATA SYSTEMSprof. Stratos Idreos

class 6

more about column-store plans and compression

top related