chapter 11.2 file system implementation – part 2

Chapter 11.2 File System Chapter 11.2 File System Implementation – Part 2Implementation – Part 2

11.2/40 Silberschatz, Galvin and Gagne ©2005Operating System Concepts

Chapter 11: File System ImplementationChapter 11: File System Implementation

Chapter 11.1

File-System Structure

File-System Implementation

Directory Implementation

Chapter 11.2

Allocation Methods

Chapter 11.3

Free-Space Management

Recovery

Log-Structured File System


11.4 Allocation Methods11.4 Allocation Methods

An allocation method refers to how disk blocks are arranged that store file data (records).

There are three primary approaches: Contiguous allocation

Linked allocation

Indexed allocation


Contiguous Allocation of Disk SpaceContiguous Allocation of Disk Space Each file occupies a set of contiguous blocks on the disk Blocks occupy a linear ordering, and disk head movements (a disk seek), are

only to next sectors on track or to the next track within cylinder, etc. Number of disk seeks is therefore minimal since all blocks are kept together. Directory entry typically has address of first block and the number of blocks

only. This is all that is needed.

File access is very straightforward. For sequential access, the file system keeps track of the last block

referenced and can readily read the next block (see FCB format). For random access to some specific block, given that we want block i and

we typically start at block b, we can go very quickly to block b + i.

Biggest problem: file growth. Is totally new space required or other mechanism? Ahead. Extents may

help, but still a significant problem… Let’s look and see what a file might look like…


Contiguous Allocation of Disk Space - VisualContiguous Allocation of Disk Space - Visual

Can easily see starting block number and number of blocks for each file.

See ‘count’ starts at 0 on the disk.

‘Mail’ starts at block 19 for six blocks.

All allocations are contiguous!

Note: there are holes!

This is simplistic, however.


Contiguous Allocation of Disk SpaceContiguous Allocation of Disk Space Finding Space – allocation schemes:

Both first fit and best fit work pretty well, with first fit generally a bit better. (We will see how the system keeps track of available blocks ahead…)

Worst fit is undesirable in terms of time and storage utilization. All contiguous allocation schemes have external fragmentation issues. Could be a major or minor problem in managing an overall disk resource.

Down Side. Generally all installations have a downtime during low system usage where the disk can be compacted and external fragments brought together during a disk compaction activity. Can be done off-line – generally best. Users get a ‘warning’ of imposing

‘non-availability’ like at 3am, etc. Save your files, the system will not be available for a while. Disks can be ‘reorganized’ and garbage collected… We have ‘periodic maintenance’ and ‘system saves’ and compaction…… More later…


Extent-Based SystemsExtent-Based Systems How much space is needed for the file? Oftentimes we do not know!

Lots of times, files cannot be extended ‘in place.’ So, what to do?

Can take system offline, allocate more space; move the data, and then restart the system

Very costly in run time.

We often overestimate required space – can be very wasteful, especially if all the ‘required’ newly requested space is really not used / needed.

Can find a totally larger space, copy the file into the new space and release old space.

But this involves down time, possibly rerunning a process, and other management considerations.

Some systems use extent-based file systems and they allocate disk blocks in extents

An extent is a contiguous block of disks

A file consists of a basic allocation plus one or more extents.

IBM uses a SPACE parameter: A process requests an original allocation of say 10 tracks and 2 possible extents of one track each. Ten are allocated and two are held in reserve and used if needed.

Extents are ‘linked in’ as needed.


Linked Allocation of Disk SpaceLinked Allocation of Disk Space Here, in linked allocation, we no longer have problems with contiguous

allocation scheme. Each file is a linked list of disk blocks: blocks may be scattered anywhere on

the disk. Directory will point to the first block, and each block points to the next

block. (of course, links take some of the space in the block) For a New file: create a new entry in the directory – no final size is needed.

Pointer is set to null and each request requires the space management system to find a block and link it in.

No external fragmentation, and file can grow. Disk need not be compacted due to this kind of allocation.

Major Disadvantage: Cannot be used for random access – only sequential access. We must follow the pointers until we find the desired block. Not efficient if we need a direct-access capability.

Also pointers do take up some space, if one adds them up!


Linked Allocation of Disk Space - ClustersLinked Allocation of Disk Space - Clusters Lots of times clusters of blocks are allocated. If so, the pointers will occupy much less space, and efficiency is

improved because the cluster of blocks are located in contiguous locations.

But, of course, this means there’s a possibility of external fragmentation. Clusters are nevertheless used in most systems. There are a lot of inherent dangers is present in a linked allocation:

dropping a pointer. Could link into a protected area Could link into some other file Could simply lose your data!!!

Potential Solution - often used: have a doubly-linked list Potential Solution2 – store the file name and relative block number in

each block – but this requires more space! And these links add up!

So there are issues with linked allocation. Let’s see what linked allocation looks like….


Linked Allocation - VisualLinked Allocation - Visual

Note: Starting location only is stored in the directory.

All else is linked!

Why might you think that in addition to the starting link, only the last link is stored in directory??


Linked Allocation with File Allocation Table. Linked Allocation with File Allocation Table.

Many disks use a FAT (File Allocation Table), which is a data structure on disk and located at the beginning of each volume.

The directory has one entry per file, and this entry points into the FAT for a particular file reference.

(The FAT is indexed by block number)

The FAT entry contains the address of the ‘next’ block in the file for random access.

Final block in the table has a special end of file mark. (See next slide)

Remember: linked allocation only permits sequential access!

Unused blocks in the FAT have a 0 table value.

When more space is needed for the linked file, the file management system finds an available block (value 0 in the FAT) and moves that block number to the previous block’s EOF value. (simply a singly-linked list…)

Downslide: This scheme may result in a lot of disk head movement, which definitely slows things down.

Solution: Cache the FAT for sure.

Advantage: random-access is greatly improved because any block can be accessed via the FAT access, particularly if the FAT is in cache, if we know the block number.


File-Allocation Table - VisualFile-Allocation Table - Visual

Indexed byblock number.


Indexed Allocation of Disk SpaceIndexed Allocation of Disk Space

In linked allocation, we

don’t have the external fragmentation problem and we

don’t have the size declaration problem, but

we also do not have direct access capability without the FAT because the pointers to the blocks are within the blocks and hence must be retrieved.

Indexed Allocation brings all pointers (links) together into the index block.

Each file has its own index built as an array of block addresses.

To access a block, we use the index,

search the index for a hit, and

hit (if present) will point to the disk location for that block.


Indexed Allocation of Disk SpaceIndexed Allocation of Disk Space

Indexed allocation supports direct access w/no external fragmentation. Any free block will suffice when a block needs to be added to the file.

Pointer overhead is more than linked allocation because we actually have a separate file: the index.

This index itself will occupy at least one block of disk storage. (Of course, it can be cached during use – and generally is.)

So how large should the index block be?

Want it to be small, since every indexed file will have one, but we want a sufficient number of entries to support large file access.

Want it to be large? Might need to link several index blocks.

Several implementations of this, as we shall see.


Example of Indexed Allocation - VisualExample of Indexed Allocation - Visual

Shows recods in block 19 as well as unused space…


Structure of the Index BlockStructure of the Index Block

Linked Scheme: usually one-block long, but we can link blocks (that is, several ‘indices’) for particularly large files. (very large files.)

Multilevel index: First index block may only be a set of pointers to a second level index block. These in turn point to the data blocks.

IBM uses this organization for its indexed sequential files, which it calls Key Sequenced Data Sets (KSDS).

It calls the outermost block the index set, followed by the sequence set followed by the data themselves organized into what they call control areas and control intervals…

Note: a two-level index would allow a file size of up to 4GB (with 4K blocks).

Combined Scheme: (used by Unix) keeps the first set of pointers of the index block in the file’s inode

This scheme involves a number of direct and indirect blocks and we will not spend time on this one.


Indexed Allocation – Mapping (Cont.)Indexed Allocation – Mapping (Cont.)

outer-index

index table file

General mappings with multiple indicesSome systems have ‘coarse indices followed by ‘fine’ indices, etc….


INDEX COMPONENT

…

. . .

INDEX SET

SEQUENCE SET

CONTROLINTERVALS

CONTROL AREA CONTROL AREA CONTROL AREA

. . .

DATA COMPONENT


I1 I2

S1 S2 S3

D1 D2 D3 D4

9/S162S2

FREE

FREE

FREE

FREE

FREE FREE FREE

3D1

9D2

36D3

62D4

1 3 5 9 35 36 42 43 62

CONTROL INTERVALS CONTROL INTERVALS

CONTROL AREAS

INDEXSET

SEQUENCE SETS

KEY VALUES EXTREMELY EXAGGERATED!!


PerformancePerformance Choice of an allocation methods is largely dependent upon how the data

needs to be accessed.

Contiguous Allocation – requires only one access to get to the data block.

Keep initial address in memory and calculate disk addresses from there.

Linked Allocation – keep the address of the next block in memory and can read it directly.

Major disadvantage – no random access, and access to a specific block might well require multiple reads to get ‘to’ that record.

Some systems that require direct access use a contiguous allocation scheme and linked allocation for sequential access.

These accesses must be declared when the file is created.

Sequential files will be linked

Direct access files will be contiguous and can support both direct access and sequential access, such as indexed sequential file organizations.


Performance - 2Performance - 2 Indexed Allocation – If index is in memory, accesses are quick.

Retaining the index in memory does require space; but often in cashe.

If space is available, then this is good. If space is not available, then the index and the data require two I/Os

– and this is not desirable. For multiple index blocks, more reads might be needed.

Performance using indexed allocation depends on the index structure, the size of the file, and the position of the block desired. Caching the index file(s) is significantly helpful if space is

available. There are a number of other approaches at optimization. Your book

cites that oftentimes it is not unreasonable to add thousands of extra instructions to the operating system to save just a few disk-head movements.

“Furthermore, this disparity is increasing over time, to the point where hundreds of thousands of instructions reasonably could be used to optimize head movements.” Discuss.

End of Chapter 11.2End of Chapter 11.2

chapter 11.2 file system implementation – part 2

Documents

disk blocks

number of disk

file access

file growth

block number

system offline

number of blocks

block b