disk drive operations storage access methods

79
NEU csg389 Information Storage Technologies, Fall 2006 Disk Drive Operations Storage Access Methods Lecture 2 September 19, 2006

Upload: others

Post on 31-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Disk Drive Operations Storage Access Methods

NEU csg389 Information Storage Technologies, Fall 2006

Disk Drive OperationsStorage Access Methods

Lecture 2September 19, 2006

Page 2: Disk Drive Operations Storage Access Methods

2NEU csg389 Fall 2006, Jiri Schindler

Plan for today

• Questions from last time• On-disk data organization

• what they’re all about

• Storage access methods• Block-based access• File-based access• Object-based access

Page 3: Disk Drive Operations Storage Access Methods

NEU csg389 Information Storage Technologies, Fall 2006

Disk Drive Firmware Algorithms

Page 4: Disk Drive Operations Storage Access Methods

4NEU csg389 Fall 2006, Jiri Schindler

Outline

• Mapping LBNs to physical sectors• zones• defect management• track and cylinder skew

• Bus and buffer management• optimizing storage subsystem resources

• Advanced buffer space usage• prefetching and caching• read/write-on-arrival

Page 5: Disk Drive Operations Storage Access Methods

5NEU csg389 Fall 2006, Jiri Schindler

How functionality is implemented• Some of it is in ASIC logic

• error detection and correction• signal/servo processing• motor/seek control• cache hits (often)

• Some of it is in firmware running on control processor• request processing• request queueing and scheduling• LBN-to-PBN mapping

• Key considerations: cost and performance and cost• optimize common cases• keep things simple and space-conscious

Page 6: Disk Drive Operations Storage Access Methods

6NEU csg389 Fall 2006, Jiri Schindler

Recall the storage device interface

• Linear address space of equal-sized blocks• each identified by logical block number (LBN)

65 7 12 23 ……

• Common block size: 512 bytes• Number of blocks: device capacity / block size

Page 7: Disk Drive Operations Storage Access Methods

7NEU csg389 Fall 2006, Jiri Schindler

Recall the physical disk storage reality

Cylinder

Track

Sector

Page 8: Disk Drive Operations Storage Access Methods

8NEU csg389 Fall 2006, Jiri Schindler

Physical sectors of a single-surface disk

Page 9: Disk Drive Operations Storage Access Methods

9NEU csg389 Fall 2006, Jiri Schindler

LBN-to-physical for a single-surface disk

10

9

8

7

6

54

3

2

1

012

13

11

22

18

16 17

19

20

2124

15

14

23

28 2930

31

33

32

34

25

26

27

35

4142

43

44

464536

37

38

4039

4

Page 10: Disk Drive Operations Storage Access Methods

10NEU csg389 Fall 2006, Jiri Schindler

Extending mapping to a multi-surface disk

Page 11: Disk Drive Operations Storage Access Methods

11NEU csg389 Fall 2006, Jiri Schindler

Some real numbers for modern disks

• # of platters: 1-4• 2-8 surfaces for data

• # of tracks per surface: 10s of 1000s• same thing as # of cylinders

• # sectors per track: 500-900• so, 250-450KB

• # of bytes per sector: usually 512• can be chosen by OS for some disks• disk manufactures want to make it bigger

Page 12: Disk Drive Operations Storage Access Methods

12NEU csg389 Fall 2006, Jiri Schindler

Clarification of Cylinder Numbering

Page 13: Disk Drive Operations Storage Access Methods

13NEU csg389 Fall 2006, Jiri Schindler

First Complication: Zones

• Outer tracks are longer than inner ones• so, they can hold more data• benefits: increased capacity and higher bandwidth

• Issues• increased bookkeeping for LBN-to-physical

mapping• more complex signal processing logic

– because of variable bit rate timing

• Compromise: zones• all tracks in each zone hold same number of

sectors

Page 14: Disk Drive Operations Storage Access Methods

14NEU csg389 Fall 2006, Jiri Schindler

Constant number of sectors per track

Page 15: Disk Drive Operations Storage Access Methods

15NEU csg389 Fall 2006, Jiri Schindler

Multiple “zones”

Page 16: Disk Drive Operations Storage Access Methods

16NEU csg389 Fall 2006, Jiri Schindler

A real zone breakdown

• IBM Ultrastar 18ES (1998)

24711473106831026010682981692739815876282868761704573127044552763255526450553384504346643513466224833642247126423741263378139037700SPTEnd cylinderStart cylinderZone

Page 17: Disk Drive Operations Storage Access Methods

17NEU csg389 Fall 2006, Jiri Schindler

Second Complication: Defects

• Portions of the media can become unusable• both before installation and during use

– former is MUCH more common than latter

• Need to set aside physical space as spares• simplicity dictates having no holes in LBN space• many different organizations of spare space

– e.g., sectors per track, cylinder, group of cylinders, zone

• Two schemes for using spare space to handledefects• remapping

– leave everything else alone and just remap the disturbedLBNs

• slipping– change mapping to skip over defective regions

Page 18: Disk Drive Operations Storage Access Methods

18NEU csg389 Fall 2006, Jiri Schindler

One spare sector per track

10

9

8

7

6

54

3

2

1

0 22

26 1727

38

15 16

2839 182940

19

2031

3041

32

434233

34

2211

12 23

24 35

3736

2514

13

Page 19: Disk Drive Operations Storage Access Methods

19NEU csg389 Fall 2006, Jiri Schindler

Remapping from defective sector to spare

10

9

8

7

6

54

3

2

1

0 22

26 1727

38

15 16

2839 182940

19

2031

3041

32

434233

34

2211

12 23

24 35

3736

25

14

13

Page 20: Disk Drive Operations Storage Access Methods

20NEU csg389 Fall 2006, Jiri Schindler

LBN mapping slipped past defective sector

10

9

8

7

6

54

3

2

1

0 22

25 1726

38

15 16

2739 182840

19

2030

2941

31

434233

34

2211

12 23

24 35

3736

32

14

13

Page 21: Disk Drive Operations Storage Access Methods

21NEU csg389 Fall 2006, Jiri Schindler

Some Real Defect Management Schemes

• High level facts• percentage of space: < 1%• always slip if possible

– much more efficient for streaming data

• One real scheme: Seagate Cheetah 4LP• 108 spare sectors every 12 cylinders

– located on the last track of the 12-cylinder group– used only for remapped sectors grown during usage

• many spare sectors on innermost cylinders– used to provide backstop for all slipped sectors

Page 22: Disk Drive Operations Storage Access Methods

22NEU csg389 Fall 2006, Jiri Schindler

Computing physical location from LBN

• First, check list of remapped LBNs• usually identifies exact physical location of

replacement• If no match, do the steps from before

• but, also account for slipped sectors that affectdesired LBN

• About 10 different management schemes• For any given scheme, the computations can be

fairly straightforward. However, it is quite complexto discuss them all at once concretely

Page 23: Disk Drive Operations Storage Access Methods

23NEU csg389 Fall 2006, Jiri Schindler

When defects “grow” during operation

• First, try ECC• it can recover from many problems

• Next, try to read the sector again• often, failure to read the sector is transient• cost is a full rotation added to access time

• Last resort, report failure and remap sector• this means that the stored data has been lost• until next write to this LBN, reads get error

response– new data allows the location change to take effect

Page 24: Disk Drive Operations Storage Access Methods

24NEU csg389 Fall 2006, Jiri Schindler

Error Recovery Algorithm for READs

NO

YES

YES

YES NO

NO

NO

READSELECTOR

ECCCOMPARE

RETRYLIMIT

REACHED?

RETRYSECTOR

READ

INCREASEERROR RECOVERY

LEVEL

SIGNALDATA ERROR

TO HOST

DELIVERDATA TO

HOST

ECC SYMBOL OR RETRY THRESHOLD

REACHED?

REPLACE BLOCKOR SIGNAL HOST TO

PERFORMREPLACEMENT

ERRORRECOVERY LEVELS

EXHAUSTED?

typically tries 3 times

Page 25: Disk Drive Operations Storage Access Methods

25NEU csg389 Fall 2006, Jiri Schindler

Third Complication: Skew

• Switching from one track to another takes time• sequential transfers would suffer full rotation

• Solution: skew• offset physical location of first sector to avoid extra

rotation– selection of skew value made from switch time statistics

• Track skew• for when switching to next surface within a cylinder

• Cylinder skew• for when switching from last surface of one cylinder

to first surface of next cylinder

Page 26: Disk Drive Operations Storage Access Methods

26NEU csg389 Fall 2006, Jiri Schindler

What happens to requests that span tracks?

3

2

5

4

1

011

10

9

8

7

617

35 132423 12

251426

15

1628

27

293018

3119

20 32

33

3422

21

7

6

5

4

3

21

0

11

10

9

819

25 152613 14

271628

17

1830

29

313220

3321

22 34

35

2412

23

6

5

4

3

2

10

11

10

9

8

718

24 142512 13

261527

16

1729

28

303119

3220

21 33

34

3523

22

Request spans 2 tracks After reading first part After track switch

Page 27: Disk Drive Operations Storage Access Methods

27NEU csg389 Fall 2006, Jiri Schindler

What happens to requests that span tracks?

3

2

5

4

1

011

10

9

8

7

617

35 132423 12

251426

15

1628

27

293018

3119

20 32

33

3422

21

7

6

5

4

3

21

0

11

10

9

819

25 152613 14

271628

17

1830

29

313220

3321

22 34

35

2412

23

6

5

4

3

2

10

11

10

9

8

718

24 142512 13

261527

16

1729

28

303119

3220

21 33

34

3523

22

Request spans 2 tracks After reading first part After track switch

Sector 12 rotates past during track switch, so full rotation needed

Page 28: Disk Drive Operations Storage Access Methods

28NEU csg389 Fall 2006, Jiri Schindler

Same request with track skew of one sector

3

2

5

4

1

011

10

9

8

7

616

33 123422 23

351324

14

1526

25

272817

2918

19 30

31

3221

20

7

6

5

4

3

21

0

11

10

9

818

35 142412 13

251526

16

1728

27

293019

3120

21 32

33

3423

22

6

5

4

3

2

10

11

10

9

8

717

34 133523 12

241425

15

1627

26

282918

3019

20 31

32

3322

21

Request spans 2 tracks After reading first part After track switch

Track skew prevents unnecessary rotation

Page 29: Disk Drive Operations Storage Access Methods

29NEU csg389 Fall 2006, Jiri Schindler

Examples of Track and Cylinder Skews

955536493563063

975637498623242

10258390101643341

TrackTrack CylinderSPTCylinderSPT

IBMUltrastar 18ES

QuantumAtlas 10k

Zone

Skew

Page 30: Disk Drive Operations Storage Access Methods

30NEU csg389 Fall 2006, Jiri Schindler

Computing Physical Location from LBN

Figure out cylno, surfaceno, and sectno• using algorithms indicated previously

Compute total skew for first mapped physicalsector on this track• totalskew = (cylno * cylskew) +

(surfaceno + (cylno * (surfaces-1)) * trackskew)

Compute rotational offset on given track• offset = (totalskew + sectno) % sectspertrack

Page 31: Disk Drive Operations Storage Access Methods

NEU csg389 Information Storage Technologies, Fall 2006

Basic On-disk Caching

Page 32: Disk Drive Operations Storage Access Methods

32NEU csg389 Fall 2006, Jiri Schindler

On-disk RAM

• RAM on disk drive controllers• firmware• speed matching buffer• prefetching buffer• cache

• Canonical disk drive buffers• several fixed-size “segments”• latest thing: variable-size segments• down the road: OS style management

one $ segment

sector

Page 33: Disk Drive Operations Storage Access Methods

33NEU csg389 Fall 2006, Jiri Schindler

Prefetching and Caching

• Prefetching• sequential prefetch essentially free until next

request arrives– and until track boundary

• Note: physically sequential sectors are prefetched– usefulness depends on access patterns

• Example algorithms– prefetch until buffer is full or next request arrives– MIN and MAX values for prefetching– if track n-1 and n have been READ, prefetch track n+1

• Caching• data in buffer segments retained as cache• most of the benefit comes from prefetching

Page 34: Disk Drive Operations Storage Access Methods

34NEU csg389 Fall 2006, Jiri Schindler

Disk Drive – Complete System?

BUSI-FACE

BUFFFER/CACHE

BUS

CONTROLPROCESSOR

I/O DevicesCPU Memory

CACHE

MEDIA

Page 35: Disk Drive Operations Storage Access Methods

35NEU csg389 Fall 2006, Jiri Schindler

Not really, recall this…

Rest of System

I/O Controller

Bus AdapterI/O Bus

System BusDevice Driver(s)

CompletionsRequests

Independent Disks storage subsystem

Page 36: Disk Drive Operations Storage Access Methods

NEU csg389 Information Storage Technologies, Fall 2006

File Systems

Page 37: Disk Drive Operations Storage Access Methods

37NEU csg389 Fall 2006, Jiri Schindler

Key FS design issues

• Application interface and system software• Data organization and naming• On-disk data placement• Cache management• Metadata integrity and crash recovery• Access control

Page 38: Disk Drive Operations Storage Access Methods

38NEU csg389 Fall 2006, Jiri Schindler

Starting at the top: what applications see

• At the highest level (in most systems)• contents of a file: sequence of bytes• most basic operations: open, close, read, write

• open starts a “session” and returns a “handle”• in POSIX (e.g., Linux and various UNIXes)

– handle is process-specific integer called “file descriptor”– session remembers current offset into file– for local files, session also postpones full file deletion

• handle is provided with each subsequent operation• read or write access bytes at some offset in file

– could be explicitly provided or remembered by session

• close ends session and destroys the handle

Page 39: Disk Drive Operations Storage Access Methods

39NEU csg389 Fall 2006, Jiri Schindler

Sidebar: shared and private sessions

fd1

fd2

Filedescriptor

Open fileobject

File

offset

fd1

fd2

Filedescriptors

Open fileobjects

File

offset

offset

• Two opens of the same file yield independentsessions

• A session can be shared across handles orprocesses

Page 40: Disk Drive Operations Storage Access Methods

40NEU csg389 Fall 2006, Jiri Schindler

Where information resides

User Level

System Calls

File Descriptors

File System

Kernel

Page 41: Disk Drive Operations Storage Access Methods

41NEU csg389 Fall 2006, Jiri Schindler

Some associated structures in kernel

File Structures

fds[0] fds[1] fds[2] ... fds[n]

f_flagf_count=1f_offsetf_inode

f_flagf_count=2f_offsetf_inode

f_flagf_count=1f_offsetf_inode

. . .

Process B

“sharing” file structurebut have different filedescriptors

fds[0] fds[1] fds[2] ... fds[n]

Process A

Page 42: Disk Drive Operations Storage Access Methods

42NEU csg389 Fall 2006, Jiri Schindler

Moving data between kernel and application

• Common approach: copy it• as in read(fd, buffer, size) or write(fd, buffer, size)• simplifies things in two ways

– application knows it can use the memory immediately• and that the corresponding data is in that memory

– kernel has no hidden coordination with application• e.g., later changes to buffer do not silently change file

• Sometimes better approach: hand it off• as in char *buffer = read(fd, size)

– notice that buffer containing data is returned• this allows page swapping (via VM) rather than copying

• downsides– sometimes not much of a performance improvement– makes file caching more difficult– can be confusing for application writers

Page 43: Disk Drive Operations Storage Access Methods

43NEU csg389 Fall 2006, Jiri Schindler

The uio structure for scatter/gather I/O

uio_offsetuio_iovcnt = 3*uio_iovec[]...

base

length

base

length

base

length {{

{

struct uio

structiovec[]

File(logical view)

process addressspace

Page 44: Disk Drive Operations Storage Access Methods

44NEU csg389 Fall 2006, Jiri Schindler

Remember that the FS data lives on disk

User Level

System Calls

File Descriptors

File System

Kernel

Disk

Page 45: Disk Drive Operations Storage Access Methods

45NEU csg389 Fall 2006, Jiri Schindler

From file offsets to LBNs

file offsets 1 20 20 2119…

…LBN space1658

1670 2004

101 102 119 12110098 99FS blocks … 120

• File offsets• 0 to num_blocks_in_file

– offset to a file given in block number

• File System blocks• 0 to num_blocks_in_filesystem• Single block may span multiple disk LBNs

FS block size = 8 LBNs (4KB)

99 120

Page 46: Disk Drive Operations Storage Access Methods

46NEU csg389 Fall 2006, Jiri Schindler

Mapping file offsets to disk LBNs

• Issue in question• must know which LBNs hold which file’s data

• Trivial mapping: just remember start location• then keep entire file in contiguous LBNs

– what happens when it grows?• alternately, include a “next pointer” in each “block”

– how does one find location of a particular offset?

• Most common approach: block lists• an array with one LBN per block in the file• Note: file block size can exceed one logical block

– file system treats groups of logical blocks as a unit

Page 47: Disk Drive Operations Storage Access Methods

47NEU csg389 Fall 2006, Jiri Schindler

A common approach to recording a block list

Direct Block 1

Direct Block 2

Direct Block 12

Data

Data

Data

Indirect Block

Double-IndirectBlock

(lbn 576)

(lbn 344)

(lbn 968)

(lbn 632)

(lbn 1944)

(lbn 480)

(lbn 96)

(lbn 176)

(lbn 72). . .

. . .

Data Block 13

Data Block 14

Data Block N

. . .

Data

Data

Data

Indirect Block 1

Indirect Block 2

. . .

Data Block N+1

Data Block N+2. . .Data Block Q+1

Data

Data

Data

Page 48: Disk Drive Operations Storage Access Methods

48NEU csg389 Fall 2006, Jiri Schindler

Inodes

• FS stores other per-file information as well• length of file• owner• access permissions• last modification time• …

• Usually kept together with the block list

Page 49: Disk Drive Operations Storage Access Methods

49NEU csg389 Fall 2006, Jiri Schindler

Supporting multiple file system types

User Level

System Calls

File Descriptors

Kernel

Disk

LFS FFS NFS

Vnode Layer

Vnode Operators

Page 50: Disk Drive Operations Storage Access Methods

50NEU csg389 Fall 2006, Jiri Schindler

Vnode layer: inside kernel

• Want to have multiple file systems at once• and possibly of differing types

• Solution: virtual file system layer• adding level of indirection always seems to help…

• Everything in kernel interacts with FS via avirtualized layer of functions• these function calls are routed to the appropriate

FS-specific implementations– once the correct FStype has been identified

Page 51: Disk Drive Operations Storage Access Methods

51NEU csg389 Fall 2006, Jiri Schindler

Open file object points to a vnode

Open file objectvnode Structure(for active files)

User Area

User Area of CurrentlyRunning Process

Pointer to file systemdependent structuresuch as an inode (anin-memory copy, ofcourse)

uf_ofile[]

f_vnodeoffset

v_countv_opsv_vfspv_data

Array of pointers tofile system specificfunctions forimplementing thevirtual FS interface

Page 52: Disk Drive Operations Storage Access Methods

52NEU csg389 Fall 2006, Jiri Schindler

Can also support non-disk FS

System Calls

Vnode Layer

4.2BSD File SystemPC File System

Network

Disk

NFS NFSServer

Floppy

Page 53: Disk Drive Operations Storage Access Methods

53NEU csg389 Fall 2006, Jiri Schindler

Key FS design issues

• Application interface and system software• Data organization and naming• On-disk data placement• Cache management• Metadata integrity and crash recovery• Access control

Page 54: Disk Drive Operations Storage Access Methods

54NEU csg389 Fall 2006, Jiri Schindler

What makes this so important?

• One of the biggest problems, looking ahead• with TBs of data, how does one organize things• how to ensure we can find what we want later?

• Not nearly as easy as it seems• try to find some old piece of paper sometime

– e.g., your exam #2 from Calculus 3• think ahead to when you’re a lot busier…

Page 55: Disk Drive Operations Storage Access Methods

55NEU csg389 Fall 2006, Jiri Schindler

Common approach: directory hierarchy

• Hierarchies are good to deal with complexity• … and data organization is a complex problem

• It works well for moderate-sized data sets• easy to identify course breakdowns• when it gets too big, split it and refine namespace

• Traversing the directory hierarchy• the ‘.’ and ‘..’ entries

/

dirc

wow

dira dirb

F/S

filedirectories

Page 56: Disk Drive Operations Storage Access Methods

56NEU csg389 Fall 2006, Jiri Schindler

What’s in a directory

• Directories to translate file names to inode IDs• just special file with entries formatted in some way

• often sets of entries put in sector-sized chunks

4 bytes 2 bytes 2 bytes variable lengthNULL

Inodenumber

Recordtype

Lengthof name File Name (max. 255 characters)

# FILE 5 foo.c # DIR 3 bar # DIR 6 mumble

A directory block with three entries

Page 57: Disk Drive Operations Storage Access Methods

57NEU csg389 Fall 2006, Jiri Schindler

Managing namespace: mount/unmount

• One can have many FSs on many devices• … but only one namespace

• So, one must combine the FSs into onenamespace• starts with a “root file system”

– the one that has to be there when the system boots• “mount” operation attaches one FS into the

namespace– at a specific point in the overall namespace

• “unmount” detaches a previously-attached filesystem

Page 58: Disk Drive Operations Storage Access Methods

58NEU csg389 Fall 2006, Jiri Schindler

Mounting an FS

VIEW BEFORE MOUNTING

VIEW AFTER MOUNTING/

tomd

dirc

wow

dira dirb

Namespace

directory

filesub-directories

/

tomd

junk

Root FS

directory /

dirc

wow

dira dirb

FS

filedirectories

# mount FS /tomd

Page 59: Disk Drive Operations Storage Access Methods

59NEU csg389 Fall 2006, Jiri Schindler

How to find the root directory?

• Need enough information to find keystructures• allocation structures• inode for root directory• any other defining information

• Common approach• use predetermined locations within file system

– known locations of (copies of) superblocks

• Alternate approach• some external record

Page 60: Disk Drive Operations Storage Access Methods

60NEU csg389 Fall 2006, Jiri Schindler

Sidebar: multiple FSs on one disk

• How is this possible?• divide capacity into multiple “partitions”

• How are the partitions remembered?• commonly, via a “partition map” at the 2nd LBN• each partition map entry specifies

– start LBN for partition– length of partition (in logical blocks)

• Usually device drivers handle partition map• file system requests are relative to their partition• device driver shifts these requests relative to

partition start

Page 61: Disk Drive Operations Storage Access Methods

61NEU csg389 Fall 2006, Jiri Schindler

Difficulty with directory hierarchies• Can be very difficult to scale to large sizes

• eventually, the refinements become too fine• and they tend to be less distinct

• Problem: what happens when number ofentries in directory grows too large??• think about having to read through all of those

entries• possible solution: partition into subdirectories again

• Problem: what happens when data objectscould fit into any of several subdirectories??• think about having to find something specific• possible solution: multiple names for such files

Page 62: Disk Drive Operations Storage Access Methods

NEU csg389 Information Storage Technologies, Fall 2006

On-disk Data Placement

Page 63: Disk Drive Operations Storage Access Methods

63NEU csg389 Fall 2006, Jiri Schindler

Key FS design issues

• Application interface and system software• Data organization and naming• On-disk data placement• Cache management• Metadata integrity and crash recovery• Access control

Page 64: Disk Drive Operations Storage Access Methods

64NEU csg389 Fall 2006, Jiri Schindler

Fact – seek time depends on distance

Goal – requests in sequence physically near one another

1998year

15 msfull stroke seek0.80 mssingle track7.8 msseek

Quantum Atlas III

2002year

6.5 msfull stroke seek0.20 mssingle track3.6 msseek

Seagate Cheetah 15K.3

Page 65: Disk Drive Operations Storage Access Methods

65NEU csg389 Fall 2006, Jiri Schindler

Fact –positioning time dominates transfer

average seekone cylinder

Serv

ice

Tim

e (m

illis

econ

ds)

Request Size (kilobytes)1 2 4 8 16 32 64 128 256

0

10

20

30

40

Goal – fewer, larger requests to amortize positioning costs

< 1 hourread disk (128KB)> 30 hoursread disk (2KB)< 20 minread disk (seq)73.4 GBcapacityST373453model

Seagate Cheetah 15K.3

time to read the entire disk,sequentially or randomlywith varying request size

Page 66: Disk Drive Operations Storage Access Methods

66NEU csg389 Fall 2006, Jiri Schindler

Breakdown of disk head time

0%

20%

40%

60%

80%

100%

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Request Size (KB)

Dis

k H

ea

d U

sa

ge

Latency Transfer Seek

Page 67: Disk Drive Operations Storage Access Methods

67NEU csg389 Fall 2006, Jiri Schindler

File System Allocation

• Two issues• Keep track of which space is available• Pick unused blocks for new data

• Simplest solution – free list• maintain a linked list of free blocks

– using space in unused blocks to store the pointers• grab block from this list when new block is needed

– usually, the list is used as a stack

• While simple, this approach rarely yields goodperformance

Page 68: Disk Drive Operations Storage Access Methods

68NEU csg389 Fall 2006, Jiri Schindler

File System Allocation (cont.)

• Most common approach – a bitmap• large array of bits, with one bit per allocation unit

– one value says “free” and the other says “in use”• Scan the array when a new block is needed

– we don’t have to just take first “free” block in the array– we can look in particular regions– we can look for particular patterns

• Even better way (in some cases) – list of freeextents• maintain a sorted list of “free” extents of space

– each extent holds a contiguous range of free space• pull space from a part of a specific free extent

– can start at a specific point– can look for a point with significant room for growth

Page 69: Disk Drive Operations Storage Access Methods

69NEU csg389 Fall 2006, Jiri Schindler

File System Allocation – Summary

• FS performance (largely) dictated by diskperformance• and optimization starts with allocation algorithms• as always, there are exceptions to this rule

• Two technology drivers yield two goals• Closeness (locality)

– reduce seeks by putting related things close to each other– generally, benefits can be in the 2x range

• Amortization (large transfers)– amortize each positioning delay by accessing lots of data– generally, benefits can reach into the 10x range

Page 70: Disk Drive Operations Storage Access Methods

70NEU csg389 Fall 2006, Jiri Schindler

Spatial proximity can yield…

DirectoryDATA

DATA

Block #42

Block #44

<A, 3>

Inodes

Inode #3

<B, 5>

< , >

Inode #5

VariousInformation

Data Blocks

DATA

Block #20

Block #20

VariousInformation

Block #42Block #44

fast directory access

fast file access

Page 71: Disk Drive Operations Storage Access Methods

71NEU csg389 Fall 2006, Jiri Schindler

Fast File System (1984)• Source of many still-popular optimizations• For locality – cylinder groups

• called allocation groups in many modern file systems

• allocate inode in cylgroup with directory• allocate first data blocks in cylgroup with inode

Inodes Data blocksBitmap

Superblock

Default usage of LBN space

Organization of an allocation group

Page 72: Disk Drive Operations Storage Access Methods

72NEU csg389 Fall 2006, Jiri Schindler

Other ways of enhancing locality

• Disk request scheduling• for example, consider all dirty blocks in file cache

• Write anywhere• specifically, writing near the disk head [Wang99]

– assumes space is free and the head’s location is known• cool idea that nobody currently uses

• Same thing for reads• assumes multiple replicas on the disk• difficult to keep the metadata consistent

Page 73: Disk Drive Operations Storage Access Methods

73NEU csg389 Fall 2006, Jiri Schindler

FFS schemes

• To get large transfers• larger block size

– more data per disk read or write– use with fragments for small-file space

efficiency• allocate next block after previous one, if possible

– do this by starting search at block # just afterprevious

• fetch more when sequential access detected– so, multiple blocks per seek+rotational latency

Page 74: Disk Drive Operations Storage Access Methods

74NEU csg389 Fall 2006, Jiri Schindler

Other ways of getting large transfers

• Re-allocation• to re-establish sequential allocation when it was

not feasible at the time of original allocation• Can you give an example?

• Pre-allocation• to avoid a failure to allocate sequentially later

• Extents (and extent-like)• as a replacement for block lists• as a replacement for bitmaps• things to consider

– When does this help?– When does it hurt performance?

Page 75: Disk Drive Operations Storage Access Methods

75NEU csg389 Fall 2006, Jiri Schindler

Block-based vs. Extent-based Allocation

Block-Based Allocation Extent-Based AllocationFile

Storage

Page 76: Disk Drive Operations Storage Access Methods

76NEU csg389 Fall 2006, Jiri Schindler

Sidebar: BSD FFS constants

• What their purpose?• Historical prospective

• details being pushed down

track skew in sectorsTRACKSKEW

tracks per cylinderTRACKS

revs per secondRPS

rotational delay between contiguous blocksROTDELAY

sectors per trackNSECT

min percentage of free spaceMINFREE

max contiguous blocks before rotdelay gapMAXCONTIG

max blocks per file in a cylinder groupMAXBPG

MeaningParameter

Page 77: Disk Drive Operations Storage Access Methods

NEU csg389 Information Storage Technologies, Fall 2006

Object-based Access

Page 78: Disk Drive Operations Storage Access Methods

78NEU csg389 Fall 2006, Jiri Schindler

OIDs

• Generation of unique ID• Flat name space (no hierarchy)

• How to remember where things are?• Divide and conquer• Employ external applications/DATABASES

• Will discuss in the context of Centera

Page 79: Disk Drive Operations Storage Access Methods

79NEU csg389 Fall 2006, Jiri Schindler

What’s next…

• Lecture: 9/26• Database structures• DB Workloads