basic db architectures & layouts - data systems laboratory

75
basic db architectures & layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ class 4

Upload: others

Post on 12-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: basic db architectures & layouts - Data Systems Laboratory

basic db architectures & layoutsprof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 4

Page 2: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /422

design

logical design

physical design

system design next up: db architectures 101

Page 3: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /423

database kernel

applications

sql

optimizer

parser

execution

storage buffer pool

in/out

thread pool

transactions

disk

memory

cpu

Page 4: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /423

is it “good” to have modules

database kernel

applications

sql

optimizer

parser

execution

storage buffer pool

in/out

thread pool

transactions

disk

memory

cpu

Page 5: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /423

is it “good” to have modules

database kernel

applications

sql

optimizer

parser

execution

storage buffer pool

in/out

thread pool

transactions

disk

memory

cpu SAME THING IS HAPPENING NOW

WITH OPEN SOURCE STACKS

Page 6: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /424

design/implement numerous possible algorithms + data representations

choose the bestdata source, algorithms and path for each query

database kernel

data data data

algo

rithm

s/op

erat

ors

applications

sql

Page 7: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /425

query plan

database kernel

data data data

algo

rithm

s/op

erat

ors

Page 8: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /425

query plan

database kernel

data data data

algo

rithm

s/op

erat

ors

PROJECT MILESTONE 1TODAY FIRST EXAMPLE:

FROM DATA TO PLANS & ANSWERS

Page 9: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /426

select name from student

where GPA>3.0

select GPA>3.0

project name

student(id,name,GPA,address,class,…)

result

Page 10: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /426

select name from student

where GPA>3.0

select GPA>3.0

project name

student(id,name,GPA,address,class,…)

result

scan all the data?

Page 11: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /426

select name from student

where GPA>3.0

select GPA>3.0

project name

student(id,name,GPA,address,class,…)

result

scan all the data?

LOGICAL PLAN

Page 12: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /427

select name from student

where GPA>3.0

scan GPA>3.0

project name

result

studentsstudents

index scan GPA>3.0

project name

result

physical plans

Page 13: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /428

select avg(GPA) from student

where class=2017

select year=2017

project GPA

student(id,name,GPA,address,class,…)

avg GPA

result

Page 14: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /428

select avg(GPA) from student

where class=2017

select year=2017

project GPA

student(id,name,GPA,address,class,…)

avg GPA

result

data layout

algorithms/ operators

interfaces

data flow

Page 15: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /429

professor(id,name,…)

course(id,name, profId,…)

student(id,name,…)

give me all students enrolled in cs165select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

enrolled(studentId,

courseId,…)

Page 16: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4210

select course.name=“cs165”

join enrolled.courseid=course.id

student enrolled course

join student.id=enrolled.studentid

project student.name

good plan

select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

Page 17: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4211

select course.name=“cs165”

join enrolled.courseid=course.id

student enrolled course

join student.id=enrolled.studentid

project student.name

select student.name from student, enrolled, course where course.name=“cs165” and enrolled.courseId=course.id and student.id=enrolled.studentId

pushing selects down

Page 18: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4212

select min(A) from R where B<10 and C<80

internal languagelogical plan

optimizer rules/cost model/statistics

physical plan execution

Page 19: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4212

select min(A) from R where B<10 and C<80

internal languagelogical plan

optimizer rules/cost model/statistics

physical plan execution

internal language

project interface/level of abstraction

Page 20: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4213

optimizer

execution

storage

tuning

db kernel

can DBAs make wrong decisions?

can optimizers make

wrong decisions?

Page 21: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4214

memory hierarchy

data layouts

column-stores basics

Page 22: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4215

system where db runs

applications

sql

caches

cpu - cpu - cpu - cpu

memory

disk - disk - disk - disk

cpu registers

smal

ler/f

aste

r

+ flash

+ non volatile memory

memory hierarchy

Page 23: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4216

Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations Award

disk100Kx Pluto

2 years

memory100x New York1.5 hours

on board cache10x this building

10 min

on chip cache2x this room

1 min

registers my head~0

Page 24: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4217

registers

on chip cache

on board cache

memory

disk

CPU

memory wall

chea

per

fast

er

SRAM

DRAM

~1ns

~10ns

~100ns

cache miss: looking for something which is not in the cache

memory miss: looking for something which is not in memory

time

speed cpu

mem

Page 25: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4218

design of storage/access methods/algorithms should minimize: data & instruction misses

access only what you need

Page 26: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4219

page-based access

need to only read x… but have to read all of page 1

page1 page2 page3

data value x

registers

on chip cache

on board cache

memory

disk

CPU

data

mov

e

Page 27: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

5 10 6 4 12

(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

Page 28: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

5 10 6 4 12(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

Page 29: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

Page 30: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan

40 bytes

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

Page 31: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4

7 11 3 9 6

Page 32: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

Page 33: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

scan scan

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytes

Page 34: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

80 bytes

Page 35: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

3

80 bytes

Page 36: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4220

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

scan

3

120 bytes

Page 37: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

5 10 6 4 12

(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

an oracle gives us the positions

Page 38: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

5 10 6 4 12(size=120 bytes)

2 8 9 7 6 7 11 3 9 6

an oracle gives us the positions

Page 39: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

an oracle gives us the positions

Page 40: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle

40 bytes

5 10 6 4 12(size=120 bytes)

2 8 9 7 6

4

7 11 3 9 6

an oracle gives us the positions

Page 41: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4

7 11 3 9 6

an oracle gives us the positions

Page 42: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

40 bytes

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

an oracle gives us the positions

Page 43: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

oracle oracle

5 10 6 4 12(size=120 bytes) 2 8 9 7 6 4 2

7 11 3 9 6

80 bytesan oracle gives us the positions

Page 44: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

80 bytesan oracle gives us the positions

Page 45: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

3

80 bytesan oracle gives us the positions

Page 46: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4221

…5 10 6 4 12 2 8 9 7 6 7 11 3 9 6

memory level N

memory level N-1

query x<5

page size: 5x8 bytes

(size=120 bytes) 2 8 9 7 6 4 27 11 3 9 6

oracle

3

120 bytesan oracle gives us the positions

Page 47: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4222

when does it make sense to have an oracle

scan=120bytes vs oracle=120bytes(and there is no such thing as an Oracle so Oracle is not for free…)

Page 48: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4223

sequential access: read one block; consume it completely; discard it; read next

what is next?

in parallel/prefetching

hardware/software can better predict/buffer sequential pages to be read

1 2 3 4

Page 49: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4224

random access: read one block; consume it partially; discard it; 1) read “random” next 2) might have to read it again in the future

Page 50: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4225

level N-1

level Nbuffer pool remember hot blocks

why not use OS caching

Page 51: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4226

device block size

os block size

dbms block size

os and db will typically refer to pageshardware vendors might also refer to sectors

Page 52: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4227

employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)

(1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, tel9, city9, salary9)

data storage blocks < pages < files

file

remember: the way we store data defines the best possible way we can access it

Page 53: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4228

employee(id:int, name:varchar(50), office:char(5), telephone:char(10), city:varchar(30), salary:int)

(1, name1, office1, tel1, city1, salary1) (2, name2, office2, tel2, city2, salary2) (3, name3, office3, tel3, city3, salary3) (4, name4, office4, tel4, city4, salary4) (5, name5, office5, tel5, city5, salary5) (6, name6, office6, tel6, city6, salary6) (7, name7, office7, tel7, city7, salary7) (8, name8, office8, tel8, city8, salary8) (9, name9, office9, tel9, city9, salary9)

header

row1row2

row3…

slotted pages

page

Page 54: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4229

free_offset, N, offset1-length1, offset2-lenght2,…

free space

slotted page

scan null

update var length

Page 55: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4230

things to “worry” abouthow much data we transfer through the memory hierarchy how many computations we do

level Nlevel N-1…cpu

Page 56: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4231

row-storeABCD

stored continuously

each page contains all fields for each entry (i.e., all attributes values)

select A,B,C,Dselect A

file

Page 57: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4232

row-store column-storeABCD A B C D

stored continuously

each page contains fields of a single attribute

select A,B,C,Dselect A

Page 58: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4233

history/timeline

~1960s

rows

1970: column storage ideas start

appearing

rows rows rows rows

1985: first rather complete column-store model

~2000: open source complete system

rows

2005-now: more ideas and industry adoption of column-

store designs

rows

monetdb

c-store,vertica,vectorwise and then

ibm,microsoft,oracle, and more

Page 59: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4234

column-store with materialized IDs

ID A ID B ID Cheader

R(A,B,C)

row1row2

row3

good idea

Page 60: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4235

a1 a2 a3 a4 a5 a6

b1 b2 b3 b4 b5 b6

c1 c2 c3 c4 c5 c6

virtual ids/ positional alignment

positional lookups/joinsA(i) = A + i * width(A)

tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6

A B C

fixed-width + dense

columns do not need to have the

same width

Page 61: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4236

disk memoryA B C D

A

ABCrow-store

engine

ok so now we can selectively read columns but how do we process them?

early tuple reconstruction/materialization

column-store

engine

option1

option2

Page 62: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4237

registers

on chip cache

on board cache

memory

disk

CPU

memory wall

chea

per

fast

er

SRAM

DRAM

~1ns

~10ns

~100ns

it is not just memory and disk

we want to move as few data items as possible all the way up to the CPU

Page 63: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4238

select min(C) from R where A<10 & B<20

A B C Ddisk memory

write the query plan and the code/logic of each operator

do not forget about intermediate results describe data layouts at each step

(milestone1 of project)

no precise final answer is OKunderstanding what matters is key

concepts & designs will be repeated >>1

Page 64: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

A B C Ddisk memory

late reconstruction/materialization

Page 65: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

A<10A B C Ddisk memory

late reconstruction/materialization

Page 66: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

A<10A B C D1: int *input=A2: for (i=0;i<tuples;i++,input++) 3: if *input<104: *output=i5: output++

A<10

disk memory

late reconstruction/materialization

Page 67: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

A<10A B C D IDsdisk memory

late reconstruction/materialization

Page 68: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

A<10A B C D IDs Bdisk memory

late reconstruction/materialization

Page 69: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

B<20A<10A B C D IDs Bdisk memory

late reconstruction/materialization

Page 70: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

B<20A<10A B C D IDs B CIDsdisk memory

late reconstruction/materialization

Page 71: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

late reconstruction/materialization

Page 72: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4239

select min(C) from R where A<10 & B<20

B<20 minCA<10A B C D IDs B CIDsdisk memory

late reconstruction/materialization

always sequential access patterns memory contains only what is needed at any point in time

Page 73: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4240

column-stores vs row-stores

it all starts with how we store the data

still basic concepts are the same

moving data is a major cost component

it is not just about disk…

the whole memory hierarchy matters

Notes to remember

Page 74: basic db architectures & layouts - Data Systems Laboratory

CS165, Fall 2017 Stratos Idreos /4241

Read: Architecture of a Database System (Sections 1,2,3,4)by J. Hellerstein, M. Stonebraker and J. Hamilton

Read: The Design and Implementation of Modern Column-store Database Systemsby D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden

keep up with reading!

Page 75: basic db architectures & layouts - Data Systems Laboratory

DATA SYSTEMSprof. Stratos Idreos

class 4

basic db architectures & layouts