database architecture 2 & storage - stanford...

Database Architecture 2 & Storage

Instructor: Matei Zahariacs245.stanford.edu

Outline

Relational DBMS architecture

Alternative architectures & tradeoffs

Storage hardware

CS 245 2

Summary from Last Time

System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based optimizer» Lock manager» Recovery» View-based access control

CS 245 3

Differentiating by Workload

Two big classes of commercial RDBMS today

Transactional DBMS: focus on concurrent, small, low-latency transactions (e.g. MySQL, Postgres, Oracle, DB2) → real-time apps

Analytical DBMS: focus on large, parallel but mostly read-only analytics (e.g. Teradata, Redshift, Vertica) → “data warehouses”

CS 245 4

How To Design Components for Transactional vs Analytical DBMS?

Component Transactional DBMS

Analytical DBMS

Data storage B-trees, row oriented storage

Column-oriented storage

Locking Fine-grained, very optimized

Coarse-grained (few writes)

Recovery Log data writes, minimize latency

Log queries

CS 245 5

Outline

Storage hardware

CS 245 6

How Can We Change the DBMS Architecture?

CS 245 7

Decouple Query Processing from Storage ManagementExample: “data lake” architecture (Hadoop, S3, etc)

Large-scalefile systems or

blob storesGFS

File formats& metadata

Processingengines

MapReduce

CS 245 8

Decouple Query Processing from Storage ManagementPros:» Can scale compute independently of storage

(e.g. in datacenter or public cloud)» Let different orgs develop different engines» Your data is “open” by default to new tech

Cons:» Harder to guarantee isolation, reliability, etc» Harder to co-optimize compute and storage» Can’t optimize across many compute engines» Harder to manage if too many engines!

CS 245 9

Change the Data Model

Key-value stores: data is just key-value pairs, don’t worry about record internals

Message queues: data is only accessed in a specific FIFO order; limited operations

ML frameworks: data is tensors, models, etc

CS 245 10

Change the Compute Model

Stream processing: Apps run continuously and system can manage upgrades, scaleup, recovery, etc

Eventual consistency: handle it at app level

CS 245 11

Different Hardware Setting

Distributed databases: need to distribute your lock manager, storage manager, etc, or find system designs that eliminate them

Public cloud: “serverless” databases that can scale compute independently of storage (e.g. AWS Aurora, Google BigQuery)

CS 245 12

Example: AWS Aurora Serverless

CS 245 13

Outline

Storage hardware

CS 245 15

StorageDevices

Typical Server

I/O Controller

CS 245 16

Network Card

Storage Performance Metrics

CS 245 17

latency (s)

throughput (bytes/s)

storage capacity(bytes, bytes/$)

CS 245 18

Storage Latency

RegistersL1 CacheL2 Cache

Memory

Tape /Optical Robot

This CampusThis Room

My Head

10 min

2 Years

2,000 Years

Andromeda

CS 245 19

Sacramento

Max Attainable Throughput

Varies significantly by device» 100 GB/s for RAM» 2 GB/s for NVMe SSD» 130 MB/s for hard disk

Assumes large reads (≫1 block)!

CS 245 20

Storage Cost

$1000 at NewEgg today buys:» 0.25 TB of RAM» 9 TB of NVMe SSD» 50 TB of magnetic disk

CS 245 21

Hardware Trends over Time

Capacity/$ grows exponentially at a fast rate (e.g. double every 2 years)

Throughput grows at a slower rate (e.g. 5% per year), but new interconnects help

Latency does not improve much over time

CS 245 22

Terms: Platter, Head, ActuatorCylinder, TrackSector (physical),Block (logical), Gap

Most Common Permanent Storage: Hard Disks

CS 245 24

Top View

CS 245 25

block xin memory

I wantblock X

Disk Access Time

CS 245 26

Time = Seek Time +Rotational Delay +Transfer Time +Other

Disk Access Time

CS 245 27

Cylinders Traveled

CS 245 28

Seek Time

Typical Seek TimeRanges from» 4 ms for high end drives» 15 ms for mobile devices

In contrast, SSD access time ranges from» 0.02 ms: NVMe» 0.16 ms: SATA

CS 245 29

Head Here

Block I Want

CS 245 30

Rotational Delay

R = 1/2 revolution

HDDSpindle[rpm]

Averagerotational

latency [ms]4,200 7.145,400 5.567,200 4.17

10,000 3.0015,000 2.00

Typical HDD figures

Source: Wikipedia, "Hard disk drive performance characteristics"

R=0 for SSDs

CS 245 31

Average Rotational Delay

Transfer rate T is around 50-130 MB/s

Transfer time: size / T for contiguous read

Block size: usually 512-4096 bytes

CS 245 32

Transfer Rate

So Far: Random Block Access

What about reading the “next” block?

CS 245 33

If We Do Things Right (Double Buffer, etc)

Time to get = block size / t + negligible

Potential slowdowns:» Skip gap» Next track» Discontinuous block placement

CS 245 34

Sequential access generally much faster than random access

…. unless we want to verify!need to add (full) rotation + block size / t

CS 245 35

Cost of Writing: Similar to Reading

To Modify Block:(a) Read Block(b) Modify in Memory(c) Write Block[(d) Verify?]

CS 245 36

Cost To Modify a Block?

Performance of DRAM

The same basic issues with “lookup time” vs throughput apply to DRAM

Min read from DRAM is a cache line (64 bytes)

Even 64-byte random reads may not be as fast as sequential ones due to prefetching, page table, controllers, etc

CS 245 37

Place co-accessed data together!

Example

Suppose we’re accessing 8-byte records in a DRAM with 64-byte cache line sizes

How much slower is random vs sequential?

CS 245 38

In the random case, we are reading 64 bytes for every 8 bytes we need, so we expect to max out the throughput at least 8x sooner.

Storage Hierarchy

Typically want to cache frequently accessed data at a high level of the storage hierarchy to improve performance

CS 245 39

CPU Cache

(KBs-MBs)

Sizing Storage Tiers

How much high-tier storage should we have?

Can determine based on workload & cost

CS 245 40

The 5 Minute Rule for Trading MemoryAccesses for Disc AccessesJim Gray & Franco PutzoluMay 1985

The Five Minute RuleSay a page is accessed every X seconds

Assume a disk costs D dollars and can do Ioperations/sec; cost of keeping this page on disk is

Cdisk = Ciop / X = D / (I X)

Assume 1 MB of RAM costs M dollars and holds Ppages; then the cost of keeping it in DRAM is:

Cmem = M / P

CS 245 41

Five Minute Rule

This tells us that the page is worth caching when Cmem < Cdisk, i.e.

CS 245 42

Source: The Five-minute Rule Thirty Years Later and its Impact on the Storage Hierarchy

Disk ArraysMany flavors of “RAID”: striping, mirroring, etcto increase performance and reliability

logically one disk

CS 245 43

Common RAID Levels

CS 245 44Image source: Wikipedia

Striping across2 disks: addsperformance butnot reliability

Mirroring across2 disks: addsreliability but notperformance (except for reads)

Striping + 1 parity disk: addsperformance and reliability atlower storage cost

Coping with Disk Failures

Detection» E.g. checksum

Correction» Requires redundancy

CS 245 45

Single Disk» E.g., error-correcting codes on read

Disk Array

Logical Physical

CS 245 46

At What Level Do We Cope?

Logical Block Copy A Copy B

CS 245 47

Operating System

E.g., network-replicated storage

Database System

Current DB Last week’s DB

CS 245 48

SummaryStorage devices offer various tradeoffs in terms of latency, throughput and cost

In all cases, data layout and access pattern matter because random ≪ sequential access

Most systems will combine multiple devices

CS 245 49

Assignment 1

Explores the effect of data layout for a simple in-memory database» Fixed set of supported queries» Implement a row store, column store,

indexed store, and your own custom store!

CS 245 50

Now posted on website!

database architecture 2 & storage - stanford...

Documents

standars.optimeconsulting.netstandars.optimeconsulting.net/.../2014/06/jumpstarthtml5.pdfsummary...

oap$examples$ - web.stanford.edu

1a 1b - web.stanford.edu

containers - web.stanford.edu

cs248 : product design - web.stanford.edu

web.stanford.edu · created date: 20120309230921z

pashto (endo-)clitics in a parallel architecture...

flow process fold intersections in...

lexical semantics - web.stanford.edu

default project: qanet for squad 2 - web.stanford.edu ·...

j100112a021 - web.stanford.edu

design features for the social web: the architecture of...

web.stanford.edu · web.stanford.edu

macrolab:avector-basedmacroprogrammingframework forcyber...

arm architecture reference manual - web.stanford.edu arm...

web.stanford.edu · iii contents preface

cs193e lecture 14 - web.stanford.edu

cs148 final image - web.stanford.edu

lecture project topics - web.stanford.edu

lecture 8 - web.stanford.edu