file management. 2 operating system components processor(s)main memorydevices process & resource...

File Management

2

Operating System Components

Processor(s) Main Memory Devices

Process &ResourceManager

MemoryManager

DeviceManager

FileManager

Computer Hardware

Operating System

3

• Persistent storage• Shared device

Why Programmers Need Files

HTMLEditor

HTMLEditor

<head>…</head><body>…</body>

WebBrowser

WebBrowser

• Structured information• Can be read by any applic

• Accessibility• Protocol



foo.html

FileManager

FileManager

FileManager

FileManager

4

Fig 13-2: The External View of the File Manager

Hardware

ApplicationProgram

ApplicationProgram

Fil

e M

gr

Dev

ice

Mgr

Mem

ory

Mgr

Pro

cess

Mgr

UNIXF

ile

Mgr

Dev

ice

Mgr

Mem

ory

Mgr

Pro

cess

Mgr

Windows

open()read()

close()

write()

lseek()

CreateFile()ReadFile()CloseHandle()

SetFilePointer()

WriteFile()mount()

5

Introduction

• What is a file?

• Where is a file located physically?

• What are the steps to access a file?

6

File Management

• File is a named, ordered collection of information

• The file manager administers the collection by:– Storing the information on a device– Mapping the block storage to a logical view– Allocating/deallocating storage– Providing file directories

• What abstraction should be presented to programmer?

7

File system context

8

Levels in a file system

9

Levels of data abstraction

10

Logical structures in a file

11

Information Structure

Records

Applications

Structured Record Files

Record-Stream Translation

Stream-Block Translation

Byte Stream Files

Storage device

12

Byte Stream File Interface

• Implements the block-stream interface• Info on file held in file descriptor

– described later

• Typical operations on file– fileID = open(fileName)– close(fileID)– read(fileID, buffer, length)– write(fileID, buffer, length)– seek(fileID, filePosition)

13

Low Level Files

Stream-Block Translation

b0 b1 b2 bi......

fid = open(“fileName”,…);…read(fid, buf, buflen);…close(fid);

int open(…) {…}int close(…) {…}int read(…) {…}int write(…) {…}int seek(…) {…}

Storage device response to commands

14

File meta-data

• File contain data plus information about the data, that is, meta-data– Meta-data is kept in a file descriptor

15

File Descriptor Information

• External name• Current state• Sharable• Owner• User• Locks• Protection settings• Length• Time of creation• Time of last modification• Time of last access• Reference count• Storage device details

16

File Descriptor in Unix

• File descriptor in UNIX is called an inode (index node), containing the following entries

17

Structured Files

• A file is a stream of bytes

• Usually want to access in a structured manner– May have no structure imposed (UNIX)

• Must be provided by application

– May have a structure imposed (VMS)• Need to maintain additional information

– Type of file

– Access methods

– Other information

18

Block Record Translation

Records

Record-Block Translation

19

Record-Oriented Sequential Files

• A structured sequential file is a named sequence of logical records, indexed by nonnegative integers

• Records may be of fixed size, or variable size– This is determined by file manager

Logical Record

fileID = open(fileName)close(fileID)getRecord(fileID, record)putRecord(fileID, record)seek(fileID, position)

20


...H byte header k byte logical record

Logical Record

• Header contains record descriptor information(occupies H bytes)

• Logical record takes up k bytes – fixed size

Next Header

21


...H byte header k byte logical record

...

FragmentPhysical Storage Blocks

Logical Record

22

Electronic Mail Examplestruct message {/* The mail message */ address to; address from; line subject; address cc; string body;};struct message *getRecord(void) { struct message *msg; msg = allocate(sizeof(message)); msg->to = getAddress(...); msg->from = getAddress(...); msg->cc = getAddress(...); msg->subject = getLine(); msg->body = getString(); return(msg);}

putRecord(struct message *msg) { putAddress(msg->to); putAddress(msg->from); putAddress(msg->cc); putLine(msg->subject); putString(msg->body);}

23


• Fixed size records can be a problem– Applications requiring large record sizes would

require that the programmer break the records into smaller pieces

– Applications only requiring small record sizes would waste space

• A solution is for the file system to be enhanced to include a function to define the record size for a file – encoded in header

24

Indexed Sequential File

• Suppose we want to directly access records

• Add an index to the file

fileID = open(fileName)close(fileID)getRecord(fileID, index)index = putRecord(fileID, record)deleteRecord(fileID, index)

25

Indexed Sequential File (cont)

Account #012345123456294376...529366...965987

Index

ik

j

index = i

index = k

index = j

Application structure

26

More Abstract Files

• Inverted files– System index for each datum in the file– Records accessed based on appearance in table

rather than their logical location• Company accounts may be accessed by customer

name, but customer may have several accounts• Set up external index table by name with pointers to

the main table

• Multimedia storage– Records contain radically different types– Access methods must be general

27

Database Management Systems

• A database is a very highly structured set of information– Stored across different files – Optimized to minimize access time

• DBMSs implementation– Some DBMSs use the normal files provided by

the OS for generic use– Some use their own storage device block

28

File systems

• File system– A data structure on a disk that holds files

• actually a file system is in a disk partition

• a technical term different from a “file system” as the part of the OS that implements files

• File systems in different OSs have different internal structures

29

A file system layout

30

Implementing Low Level Files

• Process needs to be able to read from and write to storage devices

• Simplest system is byte stream file system– (will consider record-oriented systems later)

• Storage device may be accessed 2 ways– Sequentially – like a tape drive– Randomly – like a magnetic disk

31

Low-level File System Architecture

b0 b1 b2 b3 bn-1 … …

Block 0

...

Sequential Device Randomly Accessed Device

32

Low Level Files Management

• Secondary storage device contains:– Volume directory (sometimes a root directory

for a file system)– External file descriptor for each file– The file contents

• Manages blocks– Assigns blocks to files (descriptor keeps track)– Keeps track of available blocks

• Maps to/from byte stream

33

File Manager Data Structures

External File Descriptor

Open FileDescriptor

Copy info from external to the open file descriptor

1

Process-FileSession

Keep the state of the process-file session

2

Return a reference to the data structure

3

34

An open Operation

• Locate the on-device (external) file descriptor

• Extract info needed to read/write file• Authenticate that process can access the file • Create an internal file descriptor in primary

memory• Create an entry in a “per process” open file

status table• Allocate resources, e.g., buffers, to support

file usage

35

A close Operation

• Completes all pending operations

• Release I/O buffers

• Release locks process holds on file

• Update external file descriptor

• Deallocate file status table entry

36

Opening a UNIX File

fid = open(“fileA”, flags);…read(fid, buffer, len);

0 stdin1 stdout2 stderr3 ...

Open File Table

File structure

inode

Internal File Descriptor

On-Device File Descriptor

37

Block Management

• The job of selecting & assigning storage blocks to the file

• For a fixed sized file of k blocks– File of length m requires N = m/k blocks

– Byte bi is stored in block i/k

• The logical file is divided into logical blocks

• Each logical block is mapped to a physical disk block

38

Locating file data

• The file descriptor contains data on how to perform this mapping– there are many methods for performing this

mapping

• Three basic strategies:– Contiguous allocation– Linked lists– Indexed allocation

39

Dividing a file into blocks

40

Disk Organization

Blk0Blk0 Blk1

Blk1 Blkk-1Blkk-1

BlkkBlkk Blkk+1

Blkk+1 Blk2k-1Blk2k-1

Track 0, Cylinder 0

Track 0, Cylinder 1

BlkBlk BlkBlk BlkBlk Track 1, Cylinder 0

BlkBlk BlkBlk BlkBlk Track N-1, Cylinder 0

BlkBlk BlkBlk BlkBlk Track N-1, Cylinder M-1

…

…

…

…

…

…

…

…

Boot Sector Volume Directory

41

Contiguous Allocation

• Maps the N blocks into N contiguous blocks on the secondary storage device– Simple to implement– Random access

• Does not provide for dynamic file sizes– If you want to extend a file, hope there is an empty

block following, or recopy the entire file to a larger group of unallocated contiguous blocks

Head position 237…First block 785Number of blocks 25

File descriptor

42

A contiguous file

43

Keeping a file in pieces

• We need a block pointer for each logical block, an array of block pointers– block mapping indexes into this array– Each file is a linked list of disk blocks

• But where do we keep this array?– usually it is not kept as contiguous array– the array of disk pointers is like a second related

file (that is 1/1024 as big)

44

Block pointers in the file descriptor

45

Block pointers in contiguous disk blocks

46

Linked Lists

• Each block contains a header with– Number of bytes in the block– Pointer to next block

• Blocks need not be contiguous

• Files can expand and contract

• Seeks can be slowFirst block…

Head: 417...

Length

Byte 0

Byte 4095...

Length

Byte 0

Byte 4095...

Length

Byte 0

Byte 4095...

Block 0 Block 1 Block N-1

NULL

47

Linked Lists – cont.

48

Indexed Allocation

• Extract headers and put them in an index

• Simplify seeks

• May link indices together (for large files)

Index block…

Head: 417...

Byte 0

Byte 4095...

Byte 0

Byte 4095...

Byte 0

Byte 4095...

Block 0

Block 1

Block N-1

Length

Length

Length

49

Block pointers in an index block

50

Block pointers in an index block – cont.

51

Chained index blocks

52

Two-level index blocks

53

Two-level index blocks – cont.

primary index

secondary index table data blocks

54

File system layout variations

• New UNIX file systems use cylinder groups (mini-file systems) to achieve better locality of file data

• MS/DOS uses a FAT (file allocation table) file system– so does the Macintosh OS (although the MacOS

layout is different)

UNIX FilesDatamode

owner…Direct block 0Direct block 1…Direct block 11Single indirectDouble indirectTriple indirect

inode

Data

Data

Index

Data

DataIndex

Index

Index

Index

Index

IndexIndex

Index

Data

Data

Data

Data

56

DOS FAT Files

DiskBlock

File Descriptor

DiskBlock

DiskBlock

…43

107254

Logical Linked List

57

DOS FAT Files

DiskBlock

File Descriptor

DiskBlock

DiskBlock

…

File Access Table (FAT)

DiskBlock

DiskBlock

DiskBlock

…

43

43

107

107

10743

254

254

254

File Descriptor

58

Unallocated Blocks

• How should unallocated blocks be managed?

• Need a data structure to keep track of them– Block status map (or disk bitmap)

• Small enough to be held in primary memory

– Linked list (or free list)• Very large

• Hard to manage spatial locality (need to scan list to find blocks ‘close to’ each other)

59

Free-Space Management

• Bit vector (n blocks)

…

0 1 2 n-1

bit[i] = 1 block[i] free

0 block[i] occupied

• First free block number

(number of bits per word) *(number of 0-value words) +offset of first 1 bit

60

Free-Space Management - cont.

• Bit map requires extra space. Example:

block size = 212 bytes

disk size = 230 bytes (1 gigabyte)

n = 230/212 = 218 bits (or 32K bytes)

• Easy to get contiguous files

• Linked list (free list)– Cannot get contiguous space easily– No waste of space

61

Free list organization

62

Free-Space Management – cont.

• Need to protect:– Pointer to free list– Bit map

• Must be kept on disk• Copy in memory and disk may differ.• Cannot allow for block[i] to have a situation where

bit[i] = 0 in memory and bit[i] = 1 on disk.

– Solution:• Set bit[i] = 0 in disk.• Allocate block[i]• Set bit[i] = 0 in memory

63

Marshalling the Byte Stream

• Must read at least one buffer ahead on input

• Must write at least one buffer behind on output

• Seek flushing the current buffer and finding the correct one to load into memory

• Inserting/deleting bytes in the interior of the stream

64

Buffering

• Storage devices use Block I/O• Files place an explicit order on the bytes• Therefore, it is possible to predict what will be

read after bytei

• When file is opened, manager reads as many blocks ahead as feasible

• After a block is logically written, it is queued for writing behind, whenever the disk is available

• Buffer pool – usually variably sized, depending on virtual memory needs– Interaction with the device manager and memory

manager

65

Supporting Other Storage Abstractions

• Low-level file systems avoid encoding record-level functionality– If applications use very large or very small

records, a generic file manager may be efficient– Some operating systems provide a higher-layer

file system to support applications with large or small files

– Database management systems and multimedia documents are examples

66

Other Storage Abstractions

• Modern, open operating systems tend towards low-level file systems

• Proprietary operating systems designed for specific applications implement higher layer files systems

• Structured Sequential Records– Contain collections of logical records– Need to read from or write to entire records

67

Other Storage Abstractions

• Indexed sequential files– File manager keeps table for each open file and

maps index to block containing the record– Consumes space– Read/write operations more complex– Buffering is not of much value (records accessed

in arbitrary order)

• Multimedia– Requires large files and high bandwidth

• Use larger block sizes• Try to use contiguous block allocation

68

Directories

• A directory is a set of logically associated files and other directories of files– Directories are the mechanism we use to organize

files

• The file manager provides a set of commands to manage directories– Traverse a directory– Enumerate a list of all files and nested directories

69

Directories

• Directory commands– enumerate– copy– rename– delete– traverse– etc.

70

Directory Structures

• How should files be organized within directory?– Flat name space

• All files appear in a single directory

– Hierarchical name space• Directory contains files and subdirectories

• Each file/directory appears as an entry in exactly one other directory -- a tree

• Popular variant: All directories form a tree, but a file can have multiple parents.

71

Directory Structures

72

Directory Structures – cont.

73

A directory tree

74

Directory Implementation

• Device Directory– A device can contain a collection of files– Easier to manage if there is a root for every file on

the device -- the device root directory

• File Directory– Typical implementations have directories

implemented as a file with a special format– Entries in a file directory are handles for other

files (which can be files or subdirectories)

75

Directory Implementation – cont.

• Sorted linear list of file names with pointers to the data blocks– simple to program– time-consuming to execute

• Hash Table – linear list with hash data structure– decreases directory search time– collisions – situations where two file names hash

to the same location– fixed size

76

Directory Implementation – cont.

• Physical disk may be divided into two or more logical disks– Bitmap table doesn’t need to be as large– Easier to archive– Can handle several operating systems

• Requires partitioning at device driver level

77

Mounting file systems

• Each file system has a root directory

• We can combine file systems by mounting– that is, link a directory in one file system to the

root directory of another file system

• This allows us to build a single tree out of several file systems

• This can also be done across a network, mounting file systems on other machines

78

Mounting a file system

79

UNIX mount Command

/

bin usr etc foo

bill nutt

abc

bar

blah

cde xyz

80

UNIX mount Command

/

bin usr etc foo

bill nutt

abc

bar

blah

cde xyz

/

bin usr etc foo

bill nutt

abc

bar

blah

cde xyz

mount bar at foo

file management. 2 operating system components processor(s)main memorydevices process & resource...

Documents