database management systems · advantages of dbmss •abstract data representation •relational...

Database Management Systems

Marco Serafini

COMPSCI 590S1. Lecture 7

2

MapReduce vs. DBMSs

33

Advantages of DBMSs• Abstract data representation

• Relational model• Data storage is delegated to the DBMS

• Functional query language (SQL)• Queries specified as simple relational operators• Actual query execution delegated to the DBMS…• … including parallelism, distribution, pipelining etc.

• Support for indexing

44

Disadvantages of DBMSs• SQL is a limited interface for complex analytics

• E.g. image analysis, creating maps• Need to define a schema for data a priori• High cost of loading data and indexing

• Can be amortized only if same data and schema reused• Too complex for “one shot” analytics

55

Advantages of MapReduce• Support for arbitrary UDFs• Support for a variety of arbitrary data formats• Simple API• Scalability

66

Disadvantages• Many of the optimizations of DBMS must be reimplemented by the application, for example

• Indices• Query execution plans (logical + physical)• Column-based architecture• Data format specifications (ProtoBuf)• Compression

7

Sorting

88

External Sorting• Problem

• Large table with two columns sorted on column one• Table does not fit in memory• Sort by column two

• Q: How to sort?• “Out-of-core” or “external” algorithm

99

External Sorting1. Split file in chunks2. Load each chunk in memory3. Sort and write back4. Recursive merge sort

• log(n) passes

10

Indexing

1111

Index• Index maps a value (of a column) to a row• Primary index: Maps to primary key of table (unique)• Secondary index: Maps to other columns (duplicates)

Index

Table

1212

Which Data Structure for Index?• A hash function?• A tree?• Q: Which one is better? When?

1313

Hash Index

• Advantages• Fast reads and writes

• Disadvantages• No order• Collisions

Tuple

Tuple Tuple

Tuple

Tuple

Buckets Entries

14

Binary Tree• Q: Problem?• Balance

• Balanced binary trees• E.g. R/B trees

• Locality• Space consumption

13

7

3 10

12

15

15

B+ Trees• Designed for out-of-core use

• Large nodes• Sorted (for scans)

13 30 48 61

2 5 10 12 13 17 24 28

< 13 [13,30)

…

pointers to file positions(or actual data)

pointers to file positions(or actual data)

Internal node

Leaves

1616

Clustered vs. Non-Clustered Index• Clustered index

• Reordering index elements changes physical layout• E.g. a B-Tree where leaf nodes are actual data pages• Primary index often clustered, one clustered index per table• A table without clustered index: heap (don’t be confused!)

• Non-Clustered index• Contains pointers to actual physical location• Can have multiple non-clustered indices for the same table

17

Flash/SSD• Most of these algorithms expressed considering disks• What about Flash/SSD?

• Faster but…• Coarse read/write granularity: pages (~8 kB or more)• Wear leveling

• A block can be overwritten only a finite number of times• Controller must spread writes evenly

18

Query Execution

19

Query Execution• SQL: Domain Specific Language

• Restricted language for specific task• In this case: relational algebra• Physical data independence: changes in physical storage are transparent to applications

• One SQL statement, many possible executions• Logical plan: reorder the operators• Physical plan: different ways to store the data

2020

ExampleSELECT e.cidFROM Enrollment e, Student sWHERE s.name= ’John Smith’

and e.sid = s.sid

TablesStudent(sid, name)Enrollment(sid, cid)

2121

Logical Plan• Plan in terms of relational operators

!".$%&

'(.)*+",-…-

⋈(.(%&,".(%&

Student Enrollment

!".$%&

'(.)*+",-…-

⋈(.(%&,".(%&

Student Enrollment

22

Physical Plan• Local plan annotated with physical operators

!".$%&

'(.)*+",-…-

⋈(.(%&,".(%&

Student Enrollment(File scan)

(Pipeline select and write temporary table T)

(Block nested loop join)

(File scan)

(Pipeline project) !".$%&

'(.)*+",-…-

⋈(.(%&,".(%&

Student Enrollment(Index lookup on name)

(Use index)

(Index nested loop join)

(Index lookup on sid)

(Pipeline project)

23

Log Structured Merge Trees

2424

How Good are B+ trees?• Q: Are they good for reading? Why?• Q: Are they good for writing? Why?

25

Log Structured Merge Trees• Increasingly popular data structure for DBMSs

• RocksDB (Facebook)• LevelDB (Google)

• Goals• Fast data ingestion• Leverage large memory for caching

• Problems• Write and read amplification

2626

LSMT Data Structures• Memtable

• Binary tree or skiplist à sorted• Receives writes and serves reads

• Log files (runs)• L0: dump of memtable• Li: merge of multiple Li-1 runs

2727

Operations• Writes go to memtable• Reads

• Search memtables and read caches (if available)• Search log files in reverse chronological order• Bloom filters – indices in log files

• Periodically dump memtable to L0• Periodically merge from Li-1 to Li

database management systems · advantages of dbmss •abstract data representation •relational...

Documents