cs 537 lecture 12 disks and storage

11
1 11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1 CS 537 Lecture 12 Disks and Storage Michael Swift 11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 2 Secondary storage Secondary storage typically: is anything that is outside of “primary memory” does not permit direct execution of instructions or data retrieval via machine load/store instructions Characteristics: it’s large: 80GB-1TB it’s cheap: 0.30-300¢/GB it’s persistent: data survives power loss it’s slow: 100us-10 ms to access (compared to 100ns for ram) why is this slow?? 11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 3 Another trip down memory lane … IBM 2314 About the size of 6 refrigerators 8 x 29MB (M!) 11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 4 Disk trends Disk capacity, 1975-1989 doubled every 3+ years 25% improvement each year factor of 10 every decade exponential, but far less rapid than processor performance Disk capacity since 1990 doubling every 12 months 100% improvement each year factor of 1000 every decade 10x as fast as processor performance!

Upload: others

Post on 15-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 537 Lecture 12 Disks and Storage

1

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1

CS 537 Lecture 12

Disks and Storage Michael Swift

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 2

Secondary storage

•  Secondary storage typically: –  is anything that is outside of “primary memory” –  does not permit direct execution of instructions or data

retrieval via machine load/store instructions

•  Characteristics: –  it’s large: 80GB-1TB –  it’s cheap: 0.30-300¢/GB –  it’s persistent: data survives power loss –  it’s slow: 100us-10 ms to access (compared to 100ns for

ram) •  why is this slow??

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 3

Another trip down memory lane …

IBM 2314 About the size of

6 refrigerators 8 x 29MB (M!)

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 4

Disk trends

•  Disk capacity, 1975-1989 –  doubled every 3+ years –  25% improvement each year –  factor of 10 every decade –  exponential, but far less rapid than processor performance

•  Disk capacity since 1990 –  doubling every 12 months –  100% improvement each year –  factor of 1000 every decade –  10x as fast as processor performance!

Page 2: CS 537 Lecture 12 Disks and Storage

2

Rotating Disks vs. Solid-State Disks (AKA Flash)

•  Many uses of disks are moving to solid-state devices, usually Flash –  10x more expensive in volume –  100x faster –  10x less power

•  Usually pretends to be a disk to the OS and file system 11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 6

Disks and the OS

•  Disks are messy, messy devices –  errors, bad blocks, missed seeks, performance cliffs, etc.

•  Job of OS is to hide this mess from higher-level software –  low-level device drivers (initiate a disk read, etc.) –  higher-level abstractions (files, databases, etc.)

•  OS may provide different levels of disk access to different clients –  physical disk block (surface, cylinder, sector) –  disk logical block (disk block #) –  file logical (filename, block or record or byte #)

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 7

I/O System

user process�file system�I/O system�

device driver�

user process� user process�

OS�

device controller�

disk�

Types of IO devices l  Two categories

¡ A block device stores information in fixed-size blocks, each one with its own address l e.g., disks

¡ A character device delivers or accepts a stream of characters, and individual characters are not addressable l e.g., keyboards, printers, network cards

l  Device driver provides interface for these two types of devices ¡ Other OS components see block devices and character

devices, but not the details of the devices. ¡ How to effectively utilize the device is the responsibility of the

device driver

Page 3: CS 537 Lecture 12 Disks and Storage

3

Disk Characteristics

•  Disk platters: coated with magnetic materials for recording

Disk platters

Disk Characteristics

•  Disk arm: moves a comb of disk heads –  Only one disk head is active for reading/writing

Disk platters

Disk arm

Disk heads

Hard Disk Trivia…

•  Aerodynamically designed to fly –  As close to the surface as possible –  No room for air molecules –  Like flying a 747 at Mach 41 inch off the ground (in 1982!)

•  Therefore, hard drives are filled with special inert gas •  If head touches the surface

–  Head crash –  Scrapes off magnetic information

Disk Characteristics

•  Each disk platter is divided into concentric tracks

Disk platters

Disk arm

Disk heads

Track

Page 4: CS 537 Lecture 12 Disks and Storage

4

Disk Characteristics

•  A track is further divided into sectors. A sector is the smallest unit of disk storage

Disk platters

Disk arm

Disk heads

Track

Sector

Disk Characteristics

•  A cylinder consists of all tracks with a given disk arm position

Disk platters

Disk arm

Disk heads

Track

Sector

Cylinder

Disk Characteristics

•  Cylinders are further divided into zones

Disk platters

Disk arm

Disk heads

Track

Sector

Cylinder

Zones

Disk Characteristics

•  Zone-bit recording: zones near the edge of a disk store more information (higher bandwidth)

Disk platters

Disk arm

Disk heads

Track

Sector

Cylinder

Zones

Page 5: CS 537 Lecture 12 Disks and Storage

5

More About Hard Drives Than You Ever Want to Know

•  Track skew: starting position of each track is slightly skewed (not all in a straight line) –  Minimize rotational delay when sequentially

transferring bytes across tracks

•  Thermo-calibrations: periodically performed to account for changes of disk radius due to temperature changes

•  Typically 100 to 1,000 bits are inserted between sectors to account for minor inaccuracies

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 18

Example disk characteristics (1998)

•  IBM Ultrastar 36XP drive –  form factor: 3.5” –  capacity: 36.4 GB –  rotation rate: 7,200 RPM (120 RPS) –  platters: 10 –  surfaces: 20 –  sector size: 512-732 bytes –  cylinders: 11,494 –  cache: 4MB –  transfer rate: 17.9 MB/s (inner) – 28.9 MB/s (outer) –  full seek: 14.5 ms –  head switch: 0.3 ms

Disk Access Time

•  Seek time: the time to position disk heads (~8 msec on average)

•  Rotational latency: the time to rotate the target sector to underneath the head –  Assume 7,200 rotations per minute (RPM) –  7,200 / 60 = 120 rotations per second –  1/120 = ~8 msec per rotation –  Average rotational delay is ~4 msec

Disk Access Time

•  Transfer time: the time to transfer bytes –  Assumptions:

•  58 Mbytes/sec •  4-Kbyte disk blocks

–  Time to transfer a block takes 0.07 msec

•  Disk access time –  Seek time + rotational delay + transfer time

Page 6: CS 537 Lecture 12 Disks and Storage

6

Disk Performance Metrics

•  Latency –  Seek time + rotational delay

•  Bandwidth –  Bytes transferred / disk access time

Examples of Disk Access Times

•  If disk blocks are randomly accessed –  Average disk access time = ~12 msec –  Assume 4-Kbyte blocks –  4 Kbyte / 12 msec = ~340 Kbyte/sec

•  If disk blocks of the same cylinder are randomly accessed without disk seeks –  Average disk access time = ~4 msec –  4 Kbyte / 4 msec = ~ 1 Mbyte/sec

Examples of Disk Access Times

•  If disk blocks are accessed sequentially –  Without seeks and rotational delays –  Bandwidth: 58 Mbytes/sec

•  Key to good disk performance –  Minimize seek time and rotational latency

Disk Tradeoffs

•  Disk adds 1000 bits overhead per sector •  Larger sector size à better bandwidth •  Wasteful if only 1 byte out of 1 Mbyte is needed

Sector size Space utilization Transfer rate 1 byte 8 bits/1008 bits (0.8%) 80 bytes/sec (1 byte / 12 msec) 4 Kbytes 4096 bytes/4221 bytes (97%) 340 Kbytes/sec (4 Kbytes / 12 msec) 1 Mbyte (~100%) 58 Mbytes/sec (peak bandwidth)

Page 7: CS 537 Lecture 12 Disks and Storage

7

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 25

Disk Calculations •  Example disk:

–  #surfaces: 4!–  #tracks/surface: 64K!–  #sectors/track: 1K (assumption??)!–  #bytes/sector: 512!–  RPM: 7200 = 120 tracks/sec!–  Seek cost: 1.3ms - 16ms!

•  Questions –  How many disk heads? How many cylinders? –  How many sectors/cylinder? Capacity? –  What is the maximum transfer rate (bandwidth)? –  Average positioning time for random request? –  Time and bandwidth for random request of size:

•  4KB? •  128 KB? •  1 MB?

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 26

Interacting with disks •  In the old days…

–  OS would have to specify cylinder #, sector #, surface #, transfer size

•  i.e., OS needs to know all of the disk parameters

•  Modern disks are even more complicated –  not all sectors are the same size, sectors are remapped, … –  disk provides a higher-level interface, e.g., SCSI

•  exports data as a logical array of blocks [0 … N] •  maps logical blocks to cylinder/surface/sector •  OS only needs to name logical block #, disk maps this to

cylinder/surface/sector •  on-board cache •  as a result, physical parameters are hidden from OS

–  both good and bad

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 27

Disk Controller •  Responsible for interface between OS and disk drive

–  Common interfaces: ATA/IDE vs. SCSI •  ATA/IDE used for personal storage: slow rotation, seek, high capacity •  SCSI for enterprise-class storage: faster rotation and seek •  QUESTION: which will be larger diameter? Which will have more

platters? •  Basic operations

–  Read block –  Write block

•  OS does not know of internal complexity of disk –  Disk exports array of Logical Block Numbers (LBNs) –  Disks map internal sectors to LBNs

•  Implicit contract: –  Large sequential accesses to contiguous LBNs achieve much

better performance than small transfers or random accesses

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 28

Disk Abstraction •  How should disk map internal sectors to LBNs? •  Goal: Sequential accesses (or contiguous LBNs) should

achieve best performance •  Approaches:

–  Traditional ordering

–  Serpentine ordering

Page 8: CS 537 Lecture 12 Disks and Storage

8

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 29

Reliability •  Disks fail more often....

–  When continuously powered-on –  With heavy workloads –  Under high temperatures

•  How do disks fail? –  Whole disk can stop working (e.g., motor dies) –  Transient problem (cable disconnected) –  Individual sectors can fail (e.g., head crash or scratch)

•  Data can be corrupted or block not readable/writable •  Disks can internally fix some sector problems

–  ECC (error correction code): Detect/correct bit flips –  Retry sector reads and writes: Try 20-30 different offset and timing

combinations for heads –  Remap sectors: Do not use bad sectors in future

•  How does this impact performance contract??

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 30

Buffering •  Disks contain internal memory (2MB-16MB) used as cache •  Read-ahead: “Track buffer”

–  Read contents of entire track into memory during rotational delay •  Write caching with volatile memory

–  Immediate reporting: Claim written to disk when not –  Data could be lost on power failure

•  Use only for user data, not file system meta-data

•  Command queueing –  Have multiple outstanding requests to the disk –  Disk can reorder (schedule) requests for better performance

Disk Arm Scheduling Policies •  Goal: Minimize positioning time

–  Performed by both OS and disk itself; Why?

•  First come, first serve (FCFS): requests are served in the order of arrival + Fair among requesters - Poor for accesses to random disk blocks

•  Shortest seek time first (SSTF): picks the request that is closest to the current disk arm position + Good at reducing seeks - May result in starvation

First Come, First Serve

•  Request queue: 3, 6, 1, 0, 7 •  Head start position: 2 •  Total seek distance: 1 + 3 + 5 + 1 + 7 = 17

Tracks

0

5 4 3 2 1

6 7

Time

Page 9: CS 537 Lecture 12 Disks and Storage

9

Shortest Seek Distance First

•  Request queue: 3, 6, 1, 0, 7 •  Head start position: 2 •  Total seek distance: 1 + 2 + 1 + 6 + 1 = 10

Tracks

0

5 4 3 2 1

6 7

Time

Disk Arm Scheduling Policies

•  SCAN: takes the closest request in the direction of travel (an example of elevator algorithm) + no starvation - a new request can wait for almost two full scans of the disk

SCAN

•  Request queue: 3, 6, 1, 0, 7 •  Head start position: 2 •  Total seek distance: 1 + 1 + 3 + 3 + 1 = 9

Time

Tracks

0

5 4 3 2 1

6 7

Disk Arm Scheduling Policies

•  Circular SCAN (C-SCAN): disk arm always serves requests by scanning in one direction. –  Once the arm finishes scanning for one direction –  Returns to the 0th track for the next round of scanning

Page 10: CS 537 Lecture 12 Disks and Storage

10

C-SCAN

•  Request queue: 3, 6, 1, 0, 7 •  Head start position: 2 •  Total seek distance: 1 + 3 + 1 + 7 + 1 = 13

Time

Tracks

0

5 4 3 2 1

6 7

Look and C-Look

•  Similar to SCAN and C-SCAN –  the arm goes only as far as the final request in each

direction, then turns around

•  Look for a request before continuing to move in a given direction.

Flash disks: solid state storage

•  A flash block is a grid of cells –  Single Level Cell (SLC) = 1 bit per cell (faster, more reliable) –  Multi Level Cell (MLC) = 2 bits per cell (slower, less reliable)

"""

1 1 0 1 0 0 1 1 1 1 1 1

Background

Page 11: CS 537 Lecture 12 Disks and Storage

11

Write-in-Place vs. Logging •  Rotating disks

–  Constant map from LBA to on-disk location

•  SSDs –  Writes always to new locations –  Superseded blocks cleaned later

Striping

•  LBAs striped across flash packages –  Single request can span multiple chips –  Natural load balancing

Controller

0 8 16 24 32 40

1 9 17 25 33 41

2 10 18 26 34 42

3 11 19 27 35 43

4 12 20 28 36 44

5 13 21 29 37 45

6 14 22 30 38 46

7 15 23 31 39 47

SSD performance

11/1/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 43

Device Random Read µs

Seq Read µs

Random Write µs

Seq. Write µs

$/GB

DRAM 0.05 0.05 0.05 0.05 $15 Flash 100 85 2,000 200-500 $3 Disk 5,000 500 5,000 500 $0.3