database management systems · advantages of dbmss •abstract data representation •relational...
TRANSCRIPT
Database Management Systems
Marco Serafini
COMPSCI 590S1. Lecture 7
2
MapReduce vs. DBMSs
33
Advantages of DBMSs• Abstract data representation
• Relational model• Data storage is delegated to the DBMS
• Functional query language (SQL)• Queries specified as simple relational operators• Actual query execution delegated to the DBMS…• … including parallelism, distribution, pipelining etc.
• Support for indexing
44
Disadvantages of DBMSs• SQL is a limited interface for complex analytics
• E.g. image analysis, creating maps• Need to define a schema for data a priori• High cost of loading data and indexing
• Can be amortized only if same data and schema reused• Too complex for “one shot” analytics
55
Advantages of MapReduce• Support for arbitrary UDFs• Support for a variety of arbitrary data formats• Simple API• Scalability
66
Disadvantages• Many of the optimizations of DBMS must be reimplemented by the application, for example
• Indices• Query execution plans (logical + physical)• Column-based architecture• Data format specifications (ProtoBuf)• Compression
7
Sorting
88
External Sorting• Problem
• Large table with two columns sorted on column one• Table does not fit in memory• Sort by column two
• Q: How to sort?• “Out-of-core” or “external” algorithm
99
External Sorting1. Split file in chunks2. Load each chunk in memory3. Sort and write back4. Recursive merge sort
• log(n) passes
10
Indexing
1111
Index• Index maps a value (of a column) to a row• Primary index: Maps to primary key of table (unique)• Secondary index: Maps to other columns (duplicates)
Index
Table
1212
Which Data Structure for Index?• A hash function?• A tree?• Q: Which one is better? When?
1313
Hash Index
• Advantages• Fast reads and writes
• Disadvantages• No order• Collisions
Tuple
Tuple Tuple
Tuple
Tuple
Buckets Entries
14
Binary Tree• Q: Problem?• Balance
• Balanced binary trees• E.g. R/B trees
• Locality• Space consumption
13
7
3 10
12
15
15
B+ Trees• Designed for out-of-core use
• Large nodes• Sorted (for scans)
13 30 48 61
2 5 10 12 13 17 24 28
< 13 [13,30)
…
pointers to file positions(or actual data)
pointers to file positions(or actual data)
Internal node
Leaves
1616
Clustered vs. Non-Clustered Index• Clustered index
• Reordering index elements changes physical layout• E.g. a B-Tree where leaf nodes are actual data pages• Primary index often clustered, one clustered index per table• A table without clustered index: heap (don’t be confused!)
• Non-Clustered index• Contains pointers to actual physical location• Can have multiple non-clustered indices for the same table
17
Flash/SSD• Most of these algorithms expressed considering disks• What about Flash/SSD?
• Faster but…• Coarse read/write granularity: pages (~8 kB or more)• Wear leveling
• A block can be overwritten only a finite number of times• Controller must spread writes evenly
18
Query Execution
19
Query Execution• SQL: Domain Specific Language
• Restricted language for specific task• In this case: relational algebra• Physical data independence: changes in physical storage are transparent to applications
• One SQL statement, many possible executions• Logical plan: reorder the operators• Physical plan: different ways to store the data
2020
ExampleSELECT e.cidFROM Enrollment e, Student sWHERE s.name= ’John Smith’
and e.sid = s.sid
TablesStudent(sid, name)Enrollment(sid, cid)
2121
Logical Plan• Plan in terms of relational operators
!".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment
!".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment
22
Physical Plan• Local plan annotated with physical operators
!".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment(File scan)
(Pipeline select and write temporary table T)
(Block nested loop join)
(File scan)
(Pipeline project) !".$%&
'(.)*+",-…-
⋈(.(%&,".(%&
Student Enrollment(Index lookup on name)
(Use index)
(Index nested loop join)
(Index lookup on sid)
(Pipeline project)
23
Log Structured Merge Trees
2424
How Good are B+ trees?• Q: Are they good for reading? Why?• Q: Are they good for writing? Why?
25
Log Structured Merge Trees• Increasingly popular data structure for DBMSs
• RocksDB (Facebook)• LevelDB (Google)
• Goals• Fast data ingestion• Leverage large memory for caching
• Problems• Write and read amplification
2626
LSMT Data Structures• Memtable
• Binary tree or skiplist à sorted• Receives writes and serves reads
• Log files (runs)• L0: dump of memtable• Li: merge of multiple Li-1 runs
2727
Operations• Writes go to memtable• Reads
• Search memtables and read caches (if available)• Search log files in reverse chronological order• Bloom filters – indices in log files
• Periodically dump memtable to L0• Periodically merge from Li-1 to Li